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SAMPLING FLUCTUATIONS RESULTING FROM THE 
SAMPLING OF TEST ITEMS* 


FREDERIC M. Lorp 


EDUCATIONAL TESTING SERVICE 


Sampling fluctuations resulting from the sampling of test items rather 
than of examinees are discussed. It is shown that the Kuder-Richardson 
reliability coefficients actually are measures of this type of sampling fluctua- 
tion. Formulas for certain standard errors are derived; in particular, a simple 
formula is given for the standard error of measurement of an individual 
examinee’s score. A common misapplication of the Wilks-Votaw criterion 
for parallel tests is pointed out. in is shown that the Kuder-Richardson 
formula-21 reliability coefficient should be used instead of the formula-20 
coefficient in certain common practical situations. 


1. Introduction 


Suppose that the same test is administered to a large number of separate 
groups of examinees, the groups being random samples all drawn from the 
same population; and suppose that some test statistic is computed separately 
for each sample of examinees. The value obtained for this test statistic will, 
of course, differ from sample to sample because of sampling fluctuations. 
The standard deviation of these values over a very large number of samples 
is the standard error of the test statistic when examinees are sampled. For 
convenience, this type of sampling will be referred to as 1уре-1 sampling. 

On the other hand, suppose that a large number of forms of the same test 
are administered to the same group of examinees, each form consisting of a 
random sample of items drawn from a common population of items; and 
suppose that some test statistic is computed separately for each form of the 
test. Let us assume for theoretical purposes that the examinees do not change 
in any way during the course of testing, i.e., that there is no practice effect, 
no fatigue, etc. The value computed for the test statistic will still, of course, 
differ from form to form because of sampling fluctuations. The standard 
deviation of these values over a very large number of samples is the standard 
error of the test statistic when the test items are sampled. This type of sampling 
will be referred to as a type-2 sampling. Test forms constructed by type-2 
sampling will be called randomly parallel forms or randomly parallel tests. 

Type-1 sampling fluctuations are familiar to everyone; type-1 standard 

*Most of the work reported here was carried out under contract with the Office of 


1 Research. The writer is indebted to Professor 8. 8. Wilks, who has checked 
Б тш portions of a draft of this paper. . ilks, who has checked over 
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error formulas have long been available; they are sometimes incorrectly used 
in situations where sampling of test items is of crucial importance. Formulas 
for type-1 and type-2 standard errors may usually be readily distinguished 
on a superficial level by the following characteristics, which underscore the 
essential difference between them: type-1 standard errors are usually obviously 
proportional to some power (positive or negative) of the number of examinees 
in the sample and are usually much less obviously and simply related, if at 
all, to the number of items in the test; type-2 standard errors have the corre- 
sponding characteristic with respect to the number of items in the sample. 

Section 2 of the present paper summarizes notation and lists type-2 
standard error formulas without proof. Section 3 discusses two practical 
illustrative situations in which sampling of items is of crucial importance. 
Section 4 investigates the relation between the standard errors of individual 
examinees’ scores and the Kuder-Richardson reliability coefficients, and 
reaches some important conclusions regarding the formula-21 coefficient. 
Section 5 discusses certain familiar formulas, including the Wilks-Votaw 
criterion for parallel tests, in relation to type-2 sampling formulas. Section 
6 shows that the type-2 sampling distribution of most test statistics will be 
approximately normal when the number of test items is sufficiently large. 
Section 7 gives the derivation of the type-2 standard errors presented in 
section 2. Section 8, finally, discusses simultaneous sampling of items and 
examinees (type-12 sampling) and derives certain standard error formulas 
appropriate for this more complicated situation. 


2. Notation and Summary of Formulas 


In the present study, standard errors are obtained for the following 
test statistics: 


= the observed test score of examinee а, 
answered correctly on a single test. 


Г = the mean of the scores obt 


obtained by counting the number of items 


ained by the № examinees on a single test. 
i= Y JN. 
sı = the standard deviation of the scores obtained by the N examinees on a single test. 
$= » BIN = f. 
ти = the Kuder-Richardson reliability «байып, formula 21. 


т = Ty - Кп ав]. 


r or ra = the Kuder-Richardson reliability coe 


fficient, formula 20. 
n 


РП > s?/s?) 


(symbols explained in the succeeding list). 


тег = the correlation of the test score with any external variable, с. re = Set/8e8. 
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Considerable care in defining notation must be taken here in order to 
avoid serious confusion. Additional symbols that will be used are listed below 
for easy reference. 


Zia = the “score” of examinee a on item i. = 
f Zia = lif item answered correctly 
= 0 otherwise. : 
the number of items in a single form of a test, 1.е., in a single sample. The subscript 
i runs from 1 to n. 
— the number of examinees in a single group of examinees. The subscript a runs from 
1to №. 
m = the number of items in a finite population of items. 
Pi = the observed “difficulty” of item i for the N examinees tested. 


P: = > zu/N. 


E 
Ш 


= 
| 


0: =1-р.. 
the “proportion-correct score” of examinee a; the proportion of the items in a single 
test answered correctly by examinee а. za = ta/n. 


è 
Ш 


2, č, etc. = the mean of the N values of 2, с, ete. 


а= У z,/N, etc. 


a 


M(p) — the mean of the n observed values of p; for the n items in the test administered. 
M(p) = Ури. 
Se, 8: , etc. = the standard deviation of the № values of с, г, etc. 


8 = 2 Z/N = 2 ete. 


a 


$; = the standard deviation of aya for fixed т. 
# = XN — (Xizu/N) = pq. 
Set , ete. = the covariance бушр examinees) pto and t, etc. 
See = Sra = У) (ea — A(t, — D/N. 


Sic, Siz) Sit = the covariance (over examinees) of Ca, Za , Or la , respectively, with xia, for 
fixed 7. 


Sit = ти = P (ta — Dt — D/N. 


s(p) = the standard deviation of the n observed values of р; for the п items in the test 


administered. 
sp = 2 pi/n — M'(y). 


8(siz), s(si), ete. = the standard deviation of the n observed values of s;; , si: , etc. for 
the n items in the test administered, 


s (s) = 2 з/п — С25 з. /п)?. 
(Sie , 8) = the covariance (over items) of s; and Sit. 
8(Sie , Si) = 25 SicSir/n — (25 з/т); з/т). 


Tie, Tit, Tiz = the correlation of са, ta ‚ OF Za , respectively, with =; ‚ for fixed т. ri, = si/sisi- 
toy lity le 
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It should be noted that all the statistics in the foregoing list are observed 
sample statistics relating to a given sample. There are two kinds of statistics 
listed, typified, in the simplest case, by 2 = У), z,/N and M(p) = У, р./п. 
Population parameters have not been listed but will be designated, when 
needed, by the use of Greek letters. The following additional symbols, relating 
to the totality of all possible samples of test items (type-2 sampling), will 
be used. 


E(z) = the expected value of x; the arithmetic mean of the statistic z over all possible 
samples. 

S.E.(z) = the standard error of the statistic т; the standard deviation of the statistic т 
over all possible samples. 5.Е.2(2) = E(x?) — (E(x). 

var(z) = the sampling variance. var(z) = S.E.(z). 


соу(т, y) = the sampling covariance of the statistics z and y over all possible samples. 
cov(z, у) = Е(ту) — E(x) Ey). 


Table 1 summarizes the more important of the type-2 standard errors 
derived in the present paper. For purposes of comparison, the last column of 
the table, when appropriate, gives the corresponding usual type-1 formulas 
for the standard error for the case where the test scores are assumed to be 
normally distributed. The standard error formulas in both columns are large- 
sample formulas, in general, and observable sample statistics have been 
substituted for the corresponding population values throughout. 

Type-12 standard errors are not listed here; their treatment is left for 
a special section. 


3. Illustrative Examples and Discussion of the Standard Errors 


Suppose that Form A of a certain 135-item test has been administered. 
Several parallel forms of this same test are to be administered in the future. 
Each form is administered to a different group of examinees. The groups of 
examinees may be considered as random samples drawn from the same 
population. Each group is so large that differences between groups due to 
sampling of examinees may be ignored. It is found that the mean, standard 
deviation, and Kuder-Richardson formula-20 reliability of the scores on 
Form A are 63.5, 21.5, and 0.95, respectively. How much may we expect 
the means to vary from form to form? 

The required value of s(p) could, of course, be determined directly from 
item analysis data. However, this value can be calculated, by means of (1), 
from the three numerical values given in the preceding paragraph. (1) is 
readily obtained by solving for s'(p) in Tucker’s modification (9) of the usual 
formula for the Kuder-Richardson formula-20 reliability coefficient. 


(п 1 2 
TER ns — 1) +2 — £. (1) 


We find that s'(p) = .0538. 
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The large-sample estimate of the type-2 standard error of the mean is 
found to be S.E.,() = 2.7. (The subscript “2” is used here, and the sub- 
script "1" is used below, to indicate type-2 and type-1 standard errors, 
respectively. Hereafter, type-2 sampling will be understood, unless otherwise 
specifically indicated.) If the same test were administered to random groups 
of 135 examinees, the type-1 standard error would be S.E4(D = 18. 

On the basis of the foregoing, we may expect that parallel forms of the 
test would not differ from each other in mean score by as much as2 V28.E.2(2) 
= 7.6 points more than one time in twenty. If the parallel forms are carefully 
constructed by matching items from form to form on difficulty and item-test 
correlation rather than by random sampling of items, it may well be that 
the forms will not differ from each other as much аз the foregoing formulas 
would indicate. 

Suppose, for example, it is desired to investigate the relation of length of 
reading passage to validity in a reading comprehension test. The experimenter 
might well select at random from a pool of all available reading items of some 
specified difficulty level (a) a sample of all items based on passages containing 
more than 200 words and (b) a sample based on passages containing less than 
100 words (it is assumed here that there is only one item per reading passage). 
He then places these items in random order and administers them to a group 
of examinees, obtaining separate scores for the long and for the short items. 
He computes the validity of each score, using some available criterion. If 
the two validity coefficients differ by little more than the type-2 standard 
error of their difference, it seems likely that the difference is attributable to 
chance fluctuations due to the sampling of items. If they differ by several 
times this standard error, the opposite conclusion may be reached; insofar 
as other uncontrolled experimental variables are ruled out, the difference may 
plausibly be attributed to length of reading passage. 

A note of caution is necessary in using the type-2 standard error formulas. 
These formulas involve no assumptions beyond random sampling and large 
n; however, it is not at present known just how large an n is needed in any given 
case. The formulas in Table 1, therefore, should be used with some caution. This 
is particularly true of the last three rows of the table, since the correlation 
coefficients given in the first column undoubtedly have sharply skewed 
distributions when т is small. 

It should, finally, be noted that the assumption of random sampling 
of items cannot be expected to hold for speeded tests, and the formulas given 
in the present paper must be considered inapplicable. 


4. Standard Errors of Measurement and Test Reliability 


Table 1 gives a practical approximation to S.E.(L,) in terms of observed 
sample statistics; the rigorously accurate value, as shown in a later section, is 
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S.E.(4) = АН Tat — та): (2) 


Here т, = E(t) is the true score of examinee a, i.e., the expected value of £, 
over all randomly parallel forms of the test. [The expectation symbol, E, 
denotes the mean value over all type-2 safnples; thus the operator Ё can be 
treated by the same rules as a summation sign.] The standard error of the 
score of an examinee is the standard deviation of the errors of measurement 
of his score (error of measurement — t, — 7,). The average, taken over all 
examinees, of the squared values of such standard deviations of errors of 
measurement, 


1 г 1 DE се 
ў LSE. li) = ў 2, Eli, T), (3) 


may appropriately be compared with the conventional “standard error of 
measurement" of test theory. This latter, which will be denoted by “S.E. 
Meas.," is likewise an average over all examinees. It is conventionally defined 
by the formula 


S.E.Meas. = s, V/1 — reliability. (4) 

Specifically, it will now be shown that the squared standard error of 

measurement given by (3) is exactly equal to that which would be expected 

in (4) if the test reliability there were given by the Kuder-Richardson formula- 
21 coefficient (6). In our notation, this coefficient is 


5) 


т s — U1 — т 
ЕТ = Я ( 
The significance of the present proof is that it shows that the Kuder-Richardson 
formula-21 coefficient (and, as will be seen, the formula-20 coefficient also) is no 
more nor less than a measure of the magnitude of type-2 sampling errors (relative, 
of course, to the magnitude of true score differences). 
Averaging (2) over all examinees, we find 


1 2 
N > S.E.’ (t) 


Ш 


1 
aN È тб — т.) 


= ul SV 
N Lit nN n^ 
= ۾‎ loya, (6) 


From (5) and (4), the expected value of the squared S.E. Meas. is 


1 


кыз = dl = BLS at — es] @ 
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In order to deal with (7) we first need expressions for B(s?) and E(l)': 
ue) = z[k x «= y] 


-d|$X(229t€-5-a-5r] ө 
After squaring and rearranging Е and У) signs, 
EG) = ID BIC — т) +81 (re — 2°} + NE(G — 9] 
+2 E (e = DE = д} — 2Е|@— 9 X €. ra) 
— 2Е{@— 7) 2, (7a – ®)}]. (9) 


Now the fourth and the last terms on the right vanish since B(t, — т.) and 
У), (+. = 7) both equal zero. It is seen that we have, term for term, 


1 z 
E(s) = N 5; var (t) + o? + var (2) + 0 — 2 var (i) — 0. (10) 
Now var (¢,) is given by (2), so that 
1 
Es) = aN У ти — 7.) + c? — var (1). (11) 


Finally, proceeding as in (6), we have 


n — 
n 


Ег) = + — i P+ í с? — var (2). (12) 


Next, by the definition of var (2), 
EË) = var (2) + z. (13) 
From (7), (12), and (13), 
Е (1 — ra)] 


Жл г‏ ر ہج 
ls н‏ 


ds 
ES (14) 


T7— 


с 


This result is the same as that in (6). We have shown that the average squared 
standard error of measurement found in type-2 sampling is exactly equal to 
the expected value of the squared S.E. Meas. derived from the formula-21 
Kuder-Richardson reliability coefficient. 
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The logical relation between Kuder-Richardson formulas 20 and 21 
can be derived from (1) and (5), from which it is readily found that 


8)1 — ть) = 5(1 — ra) — = t т> ; (15) 
Now the term on the left and the first term on the right of (15) are the squared 
standard errors of measurement computed from r» and from r, , respectively. 
Furthermore, since ns;/(n — 1) is the unbiased small-sample estimate of 
the population variance c; , it is seen that the last term on the right is the 
small-sample estimator for the squared standard error of the mean score 
[see (22)]. Consequently, we may rewrite (15) as 


(S.E.Meas..)* = (S.E.Meas.4)* — S.E.*(i). (16) 


The difference between r and r4 , as made apparent in (16), arises 
from the fact that some randomly parallel forms are, by chance, composed 
of harder-than-average items, or of easier-than-average items; consequently, 
the mean of the actual scores on any given test is not exactly equal to the 
mean of the true scores for the same examinees. The use of rz is appropriate 
whenever one is willing to ignore any difference between the mean test score of the 
group and their mean true score, i.e., when one is concerned only with the relative 
rather than the absolute size of the scores of the group. On the other hand, т 
should be used whenever one їз concerned with the actual magnitude of the errors 
of measurement, e.g., whenever there is a predetermined cutting score which 
divides the examinees into passing and failing groups. 

The foregoing treatment brings to our attention the very important 
fact that S.E. (ta) is actually the same as the traditional standard error of measure- 
ment of the individual examinee’s score. The first formula in the second column 
of Table 1 thus provides a very simple way of computing this important 
quantity. 


5. Comparison with Certain Standard Formulas 


A formula closely related to (4) is the following, adapted from (66) of 
reference (8), which will appear familiar to most readers: 


8..0 = Ga МТ — reliability. (667) 

The question arises as to why S.E.(D) in (66’) has a totally different 
formula from that given in Table 1 for the type-2 standard error of the 
mean. If we use (66^) to determine whether or not two forms of a test yield 
significantly different mean scores, we will always find the difference to be 
significant provided only that we take a sufficiently large number of examinees 
(№) for our experiment. This is true because the standard error in (66’) is 
inversely proportional to М N—the standard error vanishes when N is large. 
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(66^) represents the sampling fluctuations of the mean that would be observed 
if the same test were administered to successive samples of N examinees so 
chosen that the distribution of true scores was the same in each sample. If 
the same test is administered twice to the same group of examinees, (66’) 
could be used in investigating the significance of the difference between the 
mean scores obtained on the two testings, provided it can be assumed that 
there is no practice effect. In this case, there is only one test involved, and 
there is thus no sampling of test items. Obviously, (66) should not be used 
when there is sampling of items—a type-2 standard error is required. 

Consider next Wilks' (11) and Votaw's (10) procedures when either of 
these is used as a criterion of “parallelism” in tests, as suggested by Gulliksen 
(3, Ch. 14). Gulliksen defines “parallel” tests as having equal means, equal 
variances, and equal intercorrelations with each other and with all external 
criteria (as well as satisfying appropriate non-statistical criteria of parallelism). 
Wilks’ and Votaw’s significance tests provide rigorous statistical criteria for 
“parallelism” under this definition. They could appropriately be applied if 
identically the same tests were administered twice to the same examinees, 
provided it could be assumed that no practice effect had occurred. It would 
not be very desirable, however, to apply Wilks’ or Votaw's procedures to 
data such as were obtained in the second illustrative example given in section 
3. If a test composed of items having a certain characteristic is to be compared 
with a test composed of different items having a second characteristic, it 
may not be very useful to set up the null hypothesis that the two tests are 
strictly interchangeable in every way. Such a null hypothesis will always be 
rejected if N is sufficiently large, but the rejection of this hypothesis does not 
necessarily imply that the first and second characteristics have different 
effect, since the observed discrepancy might be readily accounted for as no 
greater than would be expected to be found in comparing two randomly 
parallel tests composed of the same kind of items. 


6. Sampling Distributions of Test Statistics 


It remains only to present the derivations of the results that have up 
to now been quoted without proof. The derivations are based on the assertion 
that there is a definite response (tia) that a given examinee will make to à 
given item. The nature of this response May or may not be known in advance. 
The group of N examinees to whom the items or tests are administered is a 
fixed group not subject to sampling fluctuation or other changes. 

The responses of the N examinees to item 7 may be specified by the 
column vector [z, = ить, --- , z; y]. Since each item response is assumed 
to be treated as either "right" or “wrong,” ж = 0 or 1, and there are exactly 
2" possible different vectors, i.e., different patterns of item response. If we 
let the subscript J = 1, 2, 3, --- , 2”, then these possible patterns are repre- 
sented by the i vectors x, . If two items have exactly the same pattern of 


DS 
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responses, i.e., if the response of each examinee is the same on both items, 
then the two items are wholly indistinguishable in the present situation. It 
may therefore be asserted without loss of generality that, for present purposes, 
any infinite pool of items is composed of 2" different kinds of items, designated 
by the 2" vectors x; . The relative frequencies of occurrence of the different 
kinds of items are therefore the only parameters needed to describe com- 
pletely any infinite pool; these parameters will be denoted by z; , the relative 
frequencies of occurrence of the patterns ту. 

When a random sample of n test items is drawn from the pool, the 
probability that the resulting n-item test will be composed of n, items of 
the first kind, n, items of the second kind, --- , n; items of the Ith kind, <<< , 
Tax; items of the 2"th kind is given by the standard multinomial distribution 
(7, pp. 58-59): 


n! T 
Fr ‚ть, тох) = Ты П тг. (17) 
1 


It сап be shown (1, p. 419) that the quantities V; = (n; — тт)/ М ner 
are asymptotically normally distributed for large n with zero means and 


with the (singular) variance-covariance matrix J — rr’, where Г is the 
identity matrix and т is the column vector (Vm а Мл. 5 9986 тх). 


Now, the test score of individual a is z, = > Lie/N = Yor XroMz/n, the 
21, being given constants, 0 or 1, not subject to sampling fluctuation; or, in 
terms of Vr, 


1 
a= 2 
>; TUIS T > Мт; Bta Үт 


The first term on the right is t, = 7,/n, the “true” proportion-correct score; 
so that, finally, 


Vn @ — t) = У Var sVr. 


It is thus seen that the N variables ~/n (2. — £4) are asymptotically jointly 
multinormally distributed, each with a mean of zero, a variance which turns 
out to be t, (1 — fa), and covariances t,, — t.t , where ¢,, is the proportion 
of all items answered correctly by both examinee a and examinee b. It follows 
immediately that the large-sample standard error of г, is V&(1 — £.) 
[cf. (2)]. The derivation of these and other standard errors will be left to the 
following section. 

By a well-known theorem, if f(z, , z, , --- , z,) is a function of the 2, 
having continuous first-order partial derivatives with respect to each z, at 
the point (fı , f» , *** » $»), and if at least one of these derivatives is non- 
vanishing at this point, then the quantity 

Vn er reen 299) — FG PE e йр, i]. ЕЕ 
cau Еап!. sv Resexre 7} 
Y Ha AANG CUL. 
Fated UU Ч 
Agus, Moga ө | 


= 
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is asymptotically normally distributed with zero mean when n is sufficiently 
large. This theorem assures us that the mean score (2 or Î), the standard deviation 
of the scores (s, or s,), the Kuder-Richardson formula-21 reliability (r2:), and 
the test validity (raz or Ta), are approximately normally distributed in type-2 
sampling with large n; and in addition gives us the large-sample expected 
value of each statistic. It seems highly likely that the Kuder-Richardson 
reliability, formula 20, likewise is asymptotically’ normally distributed, 
but no proof of this conclusion is available at present, in view of the fact that 
the formula for this statistic involves о?(р), which is not a function of ће z, . 

The foregoing proof of asymptotic normality follows a line of reasoning 
that would require n to be very large except when N is very small, viz. N — 2. 
'The nature of the situation, however, gives excellent reason to suppose that 
normality is approximated more quickly than the line of proof suggests when 
N > 2. No rigorous proof of this fact has been found. 


т. Derivations of Expected Values and Standard Errors 
The Individual Score 


The proportion of the items in the entire pool to which examinee a will 
give the correct answer is, by definition, t, = r,/n. If we concern ourselves 
with only a single examinee, the number of correct responses that he gives on 
one sample of items is not correlated with the number that he gives on other 
samples. If n items are drawn at random from the pool, &, , the score of 
examinee a on the resulting test, i.e., the number of items that he will answer 


successfully, will of necessity have the usual binomial distribution with 
mean and variance 


E(t.) = т. , (18) 
SE) = 1 n — c) = n5 = E. (19) 


This conclusion (and also those that follow, except as large n may be assumed) 
depends on no assumptions whatever except that of random. sampling. (19) is 
identical with (2), which was discussed in a previous section. If the observed 
value £, is substituted for the unknown т, in (19), we obtain the square of the 
first formula of Table 1. 

For finite sampling, when n items are drawn without replacement from 
a finite pool of m items, the corresponding formulas, stated without proof, are 


E(t.) = ra, (18) 


8.Е2(04) = M nn — то). (197) 
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The Mean Score of the Group Tested 


It should be noted that the scores of examinees a and b are not in- 
dependent over different parallel forms of the test. If a particular form happens 
to be composed of rather difficult items, both examinees will tend to get low 
scores; if a particular form happens to be easy, both will tend to score higher. 
Consequently, although the expected value of the mean score in the group is 
equal to the mean of the expected values of the individual scores, 1.е., 


E()- E Xs-5 (20) 


the standard error of the mean is not an average of the standard errors of the 
individual scores. 

It will be convenient from this point on to work with z, = ¢,/n, the 
proportion-correct score, rather than with t, itself. The nature of the desired 
Standard error follows immediately from the fact that the mean score (2) is 
identically equal to the average item difficulty 


# = M(p). (21) 


The usual formulas for the standard error of a mean apply to M (p), so that 
SBG = 10), (22) 


where с(р) is the standard deviation of the item difficulties over the whole 
pool of items. If the observed value of 8'(p) is substituted for the unknown 
(р), aud Н pn is substituted for 2, the square of the second formula of 
Table 1 is obtained. [(19) is a Special case of (22), being obtained when 
Di = Pia. Р 
In sampling from a finite pool of m items, the corresponding formula, 
stated without proof, is 
% 
БЕЛ (гу = mem s / 
BD =, (22') 


We may note that o(p) for a given set of items, and hence S.E.,(2) for a 
given test, will be higher when N is small than when N is large. Suppose, for 
example, that all items have the same difficulty (р) for a very large group of 
examinees, so that for this group с(р) = 0. If the same items are administered 
to a smaller group of examinees drawn at random from the larger, the observed 
values of р; in the smaller group will differ from each other because of буре-1 
sampling fluctuations, and o(p) will be greater than zero. In the extreme case 
where N = 1, the observed values of р are of necessity either 0 or 1, and 
o(p) is at а maximum. =] 
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The Standard Deviation of the Scores of the Group Tested 


In order to obtain the standard error of s? , we first use the formula for 
the variance of a sum to write 


52 => >; 255a Е (23) 


S:n being the covariance between item $ and item A. Then, again from the 
formula for the variance of a sum, 


1 
var (8) = я 25 22 2; E cov (sa, зи), (24) 
i i k 
where “cov” stands for the sampling covariance 
COV (Sin , Six) = Esus;; — EsaEs;,. 


Grouping the sums in (24), we obtain 


NS. | wn бузсан. | 
Ыы COV (Shi , ва) + 2 cov (s; , si) 


(hert iê j рек) (ijk) 
(n? 3n? +2n) (ип) 
+4 У) соу (Si: , 8j) +4 Y, соу (82, 8) 
(irik) (ivi) 
+ other sums containing no more than л? terms each |. (25) 


Here the first sum is over all sets of four subscripts no two of which are the 
same, etc. The coefficient 2 of the second sum arises from combining the two 
equivalent expressions Ууу) cov(s? , s5) and ranas cov(s,; , 83). The 
other numerical coefficients arise similarly. The polynomials in a written 
above the summation signs indicate the number of terms involved in the 
summation. 

Now, the terms under each set of summation signs in (25) are all the 
same no matter what the numerical values of the subseripts; consequently 


1 
vars? = 5i [(n* — бп? + 11n” — 6n) cov (8; уз) 


+ 2n? — Bn? + 2n) cov (st » Six) 
+ Aln? — Зп? + 2n) cov (s;; s Six) + 0(®)], (26) 


where O (n^) stands for terms of order и”. In (26) and in the following paragraph 
it is understood that h = i = j з k. 

Now, Sa: and s;, fluctuate independently over successive samples, 50 
that cov(s,; , Siz) = 0. The same is true of s? and s;, . Consequently, 
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4 
varsi = =: m^ — Зп? + 2n) cov (Si; , s) 


ia o(L) = i cov (Si; , 8) + ob). (27) 


Equation 27 gives the desired result, but not in a very useful form, since 
cov(si; , 8) is a function of population parameters and is generally not 
known. Аз a final step, then, it will be shown that s°(s;.), the actual variance 
(over items 1 to n) of the observed item-test covariances, provides a consistent 
estimate of cov(s;; , s;,); № will be proved that 


El 8)] = cov (вә , sn) + 0(1). 29 
From the formula for the covariance of a sum, 
ah : 29 
Si: = Ж > Sij ; ( ) 
55) = i XL sleu , sa), (30) 


the term under the summation sign being the actual covariance (over items 
1 to n) of the observed values of s;; and s; : 


1 
8(8,; ,s4) = = 23 SiS — E (25 DIO Sa). (31) 
Substituting from (31) into (30), and taking expected values, we find 
1 
FGM =з ULL Hes) — Д Y: У X: EG). (82 
$ D г 7 k 


Grouping the sums on the right, we have 


n(n-1) (n-2) 


Essa) + ов | 


1 |97 


кы | 


(нЕ) 


ni 


E(s84) + ow’. (33) 
(+43541 hk) 

Now, the terms under each summation sign in (33) are the same regard- 
less of the numerical value of the subscript. Furthermore, as already pointed 
out in deriving (27), cov(s,; , 8) = 0 when В = i = j = k, or in other words, 
E(s;s,) — E(s;Esu) = 0, ог Esisi) = E(s;;Es;,). Consequently, 

E[s(si)] = Es) — E(s;)E(s) + 02). (84) 


But this is the same as (28), which was to be proved. 
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The large sample standard error of s? may therefore be estimated from 
the actual variance of the observed item-test covariances: 


SE) = 2а). (35) 


Ву means of the “delta” method (5, Vol. 1, рр. 208 ff.), it is readily 
shown from (35) that in large samples 


SE/6) = SE) = Peg. (36) 


If t/n is substituted for z in (36), the square of the third equation of 
Table 1 is obtained. 


The corresponding squared standard error for sampling from finite 
populations may be shown to be 


S.B.) = 4 9 —7 ac). (37) 


mn 


The Kuder-Richardson Reliability Coefficient, Formula 20 


Let the usual formula for r4, , the Kuder-Richardson formula-20 со- 
efficient, be rewritten as follows: 


R 
Ca E 1 ( к E), (88) 


where 


1 x 
R- Б >; s/s: = M/s? , say. 
In the extraordinary case where s = 0, we will agree not to try to compute 
any value of ra . The “delta” method may now be used to obtain the result 
1 M* 
var В = gî var М + ur vars; — 21 cov (М, $). (39) 


Now var(s?) is already known from (35). Var(M) can be evaluated by the 
usual formula for the standard error of a mean: 


var M = + 26%, (40) 


where s'(si) is the actual variance of the observed item variances. Finally, 
it is readily shown, by methods similar to tho: 


se used in evaluating var(s2), 
that 


eov (M, st) = Žal sa), (41) 


FREDERIC M. LORD 17 


where s(s; , s;.) is the actual covariance between the observed item variances 
and the observed item-test covariances. Consequently, 


var R = Û (eî) + ARS?) — 4808, зд]. (42) 
Now уаг(гъ) = var(R)/n’; hence, to order 1/n*, 
S.E. (f) = ES [s(s)) + 4 (1 — rossi) — 4n(1 — rx)s(s? ,s,)]. — (43) 


It may be noted that the quantity (1 — raj) is of order 1/n, because lim,.. 
n(1 — ra) = constant. It is then seen from (43) that S.E."(ra) is a quantity 
of order 1/n?. Equation 43 leads directly to the fourth formula of Table 1. 

It may be shown that the corresponding standard error when sampling 
from a finite population is (m — n)/m times the value given in (43). 


The Kuder-Richardson Reliability Coefficient, Formula 21 


By a procedure wholly parallel to that used for the formula-20 reliability 
coefficient, it is found that, approximately, 


В.Е.) = aa ГК — 28)%°(р) + 4n2(1 — ть„у)%з°(в,.) 
— 4n(1 — ты)(1 — 295: , 8:2)], (44) 


where s(p, , s;) is the actual covariance between the observed item difficulties 
and the observed item-test covariances. Equation (44) leads directly to the 
fifth formula of Table 1. 

The standard error of the split-half reliability coefficient has not been 
worked out. It must, however, be larger than the standard error of т», given 
by (43), since ro is the mean of the split-half coefficients from all possible 
splits, as shown by Cronbach (2). 


The Validity Coefficient 
If c is an outside criterion, 


Ses 
Viele (45) 
By the “delta” method, 
2 2 
Var Te, = праь. + А — اا‎ (46) 
It is found that 
var Ses = = 8*(s,,); (47) 


1 
n 
cov (Sie , 83) = 2 ы › Siz) (48) 
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Finally, 


БЕД.) = i E 88.) — 


2r 
8,8, 


2 
S(Sie 5 Si) + = ed | (49) 
Equation (49) leads directly to the last formula of Table 1. 

The corresponding standard error for sampling from a finite pool of 
items is presumably (m — n)/m times the foregoing quantity. 


8. Simultaneous Sampling of Items and Examinees 


Simultaneous and independent sampling of items and examinees might 
be called matrix sampling instead of type-12 sampling. [A generalized approach 
to this problem is reported in (4).] Here, both the population and the sample 
may be thought of as matrices. Each row of the population matrix may be 
taken as representing one test item, and each column as representing one 
examinee. The elements of the matrix are taken to be 1’s and 0’s, depending 
upon whether or not the examinee would answer the item correctly if it were 
administered to him. The actual responses given by a random sample of 
examinees to a test consisting of a random sample of items can be thought 
of as constituting a rectangular matrix composed of n rows and N columns 
selected independently and at random from the population matrix. 

Let y be any statistic calculated from the sample matrix. Consider all 
possible n X N matrices that can be formed from the population matrix by 
a process of omitting entire rows and columns. Varı: y, the type-12 sampling 


variance of y, is, by definition, equal to the variance of the y values calculated 
from all possible such n x N matrices, i.e., 


var y = Enly — Bray), (50) 


where E,» indicates that the expectation of the directly following quantity 
is to be taken over all possible n X N matrices. (The convention of always 
following each expectation symbol with parentheses or brackets will be 
dropped.) 

Equation (50) may be made more convenient by application of a very 
familiar lemma from analysis of variance, which states that the “total sum 
of squares” is equal to the “within sum of Squares" plus the “among sum of 
squares." It is immediately found that 


var: у = Ё[Е» (у — Ejay)] + EE: лу — Ewy)’, (51) 


where Ё.., is the conditional expected value over all possible combinations 
of rows of the population matrix, the columns being held fixed, and J, is the 
expected value over all possible combinations of columns. In more concise 
notation, (51) becomes 


van, у = Ё(уагу. 1 y) + var, (E, y) (52) 
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where var;., and var, are type-2 and type-1 sampling variances, respectively. 
By symmetry, there is also the alternative equation 


var, у = E,(varı.2 у) + var: (£i sy). (53) 


If y is a consistent statistic in type-2 sampling, Z,.,y will not differ 
greatly from y in large samples. This fact suggests that it will often be found 
in large samples that 


varı (Es) = yan y. (54) 
Similarly 
Ey (vare.. y) = var: y. (55) 


If (54) and (55) hold to a satisfactory order of approximation, then (52) 
reduces to the very simple result that the type-12 sampling variance is ap- 
proximately equal to the sum of the type-1 and type-2 sampling variances 


var, у = van у + var; y. (56) 


A similar statement can be made for (88). 

The simple result represented by (56) сап be shown to hold in the case 
of the mean score, Z or 1, and, at least under the assumption that the scores 
are normally distributed, in the case of the standard deviation, s. Or 8, . 
Proofs are presented in the following two sections. 

The type-l sampling variances of the Kuder-Richardson reliability 
coefficients are not known to the writer. Since the type-2 sampling variances 
ОЁ Fao and ra, are of order 1/n? [see (43) and (44)], it seems clear that the 
type-12 sampling variances of these coefficients, to our order of approxima- 
tion, depend only on the unknown type-1 sampling variances. Neither these 
nor the type-12 sampling variances of the reliability or validity coefficients 
have been worked out. 


The Mean Score 


In the case of the mean relative test Score, from (20), (22), and (52), 
vane @ = 10,0%) + var, f, (57) 


where о*(р) is the variance over all items in the population of the values of 
р; for a given group of examinees, and PPEUN. 
According to the standard formula for the standard error of any mean, 
the last term of (57) is 
van ё = x oi, 
where о? is the standard deviation of t, over the entire population of ex- 
aminees. 


(58) 
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Next, it will be helpful to evaluate the first term on the left of (57) 
1 2 1 2 z2 1 z2 
a Fia) = a PEP: -©) = a (Eg; — Ei). (59) 


Now the difference Fip? — «i, where т; = Ep, , is by definition var, p; , 
the usual binomial variance known to equal 7;(1 — z;)/N. Hence, 


pig? М yo 
Epi = N т; + м (60) 
Similarly, 
BP = 2, (61) 
where Z = E,f, = Bar; = Ег, = Erp: = Ели, is the over-all population 


mean. From (60), 


Е.Е р: = Z. (62) 
As before, 
Em; = с? + 7 (63) 


where т; is the standard deviation of the values of т; over the entire popula- 
tion of items. From (62) and (63), 


BE = ТЯ. (64) 


The substitution first of (58) and (59), then of (61) and (64) into (57) 
gives finally 


van (2) = = (n — lor + (N — 1e2 + Za — 2)]. (65) 


Equation (65) gives the exact, small-sample sampling variance of 2. The same 
result can be obtained from (53), as a check. 


When terms of order 1/n? and of order 1/N? are neglected in (65) the 
large-sample type-12 sampling variance is found to be approximately 


AM 
vane @ = of Той. (66) 


Since от = F,s; [see (70)], it is easily seen that to our order of approxi- 
mation 


SELG = уса + ig. (67) 


FREDERIC M. LORD 21 


We thus have the simple result that the type-12 sampling variance of the 
mean test score is equal to the sum of the type-1 and the type-2 sampling 
variances approximately. 


The Standard Deviation of Scores 
In the case of s? we find by dividing (12) by n? that 


Е,5° = fa — © + 2 o; — var Žž; (68) 


n n 
or, dropping terms of order 1/n, 
Жз = бү» (69) 
as might be expected. 
Also, a standard formula gives the result 


var, в: = x lu) — ot], (70) 


where из(2) and о: are the fourth and the squared second moments of the 

distribution of the scores of all examinees. If we are willing, for the sake of 

i to assume that these scores are effectually normally distributed, 
en 


var, 82 = zo! : (71) 


From (71), (69), and (53), approximately, 
2 


var 8; = N Esci + var, о? . (72) 
Етот (72), (69), апа (35), 
vara Not + É le), (73) 


where o; is the standard deviation of all true scores and 5? NE T e 
over n items of the true item-test covariances computed using all examinees 
in the population. To our order of approximation, 


2 
SEG) = з + ELO (74) 
Under the assumption that z is effectually normally distributed, it is thus 


found that the type-12 sampling variance of s? is approximately equal to the 
sum of the type-1 and type-2 sampling variances. 


22 


11. 


PSYCHOMETRIKA 


REFERENCES 


. Cramér, H. Mathematical methods of statistics. Princeton Univ. Press, 1946. 
. Cronbach, L. J. Coefficient alpha and the internal structure of tests. Psychometrika, 


1951, 16, 297-334. 


. Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 
. Hooke, R. Sampling from a matrix, with applications to the theory of testing. Statistical 


Research Group, Princeton University. Memorandum Report 53, 1953. (Dittoed.) 


. Kendall, M. G. The advanced theory of statistics. London: Charles Griffin and Co., 


1948. 2 vols. 


. Kuder, С. Е. and Richardson, М. W. The theory of the estimation of test reliability. 


Psychometrika, 1937, 2, 151-160. 


. Mood, А. M. Introduction to the theory of statistics. New York: MeGraw-Hill, 1950. 
. Peters, C. C. and Van Voorhis, W. R. Statistical procedures and their mathematical 


bases. New York: McGraw-Hill, 1940. 


. Tucker, L. R. A note on the estimation of test reliability by the Kuder-Richardson 


formula (20). Psychometrika, 1949, 14, 117-119. 


. Votaw, D. F., Jr. Testing compound symmetry in a normal multivariate distribution. 


Ann. math. Stat., 1948, 19, 447-473. 
Wilks, S. S. Sample criteria for testing equality of means, equality of variances, and 


equality of covariances in a normal multivariate distribution. Ann. math. Stat., 1946, 
17, 257-281. 


Manuscript received 3/1/54 


Revised manuscript received 5/11/54 


PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


SEPARATION OF DATA AS A PRINCIPLE IN FACTOR ANALYSIS 


CHESTER W. Harris 


UNIVERSITY OF WISCONSIN 


Two systems of factor analysis—factoring correlations with units in 
the diagonal cells and factoring correlations with communalities in the 
diagonal cells—are considered in relation to the commonly used statistical 
procedure of separating a set of data (scores) into two or more parts. It is 
shown that both systems of factor analysis imply the separation of the 
observed data into two orthogonal parts. The matrices used to achieve the 
separation differ for the two systems of factor analysis. 


One of the recurring operations in statistical work is that of separating 
data into parts. Probably the most common example of this is that of separat- 
ing the raw score for each of a number of subjects into a deviation score plus 
the mean of the scores for these subjects. The analysis of variance offers 
examples of this practice, since this method of analysis in effect further 
separates such deviation scores into two or more parts, depending upon the 
complexity of the design. Similarly, linear regression theory postulates a 
separation of the data of the dependent variable into two parts and provides 
a method of calculating each. It is at least intuitively evident that factor 
analysis also implies a separation of data into parts; however, the particular 
characteristics of the principle followed in making the separation may not be 
well understood. The purpose of this discussion is to interpret the procedures 
of factor analysis from this point of view. 

Cochran’s theorem (3), particularly Cramér’s discussion of it (4, рр. 
116-18), shows the necessary and sufficient conditions for decomposing a 
sum of squares into orthogonal parts. Consider only the matrices of the 
quadratic forms. For such a decomposition, these matrices satisfy the equation 


Т-А + Arto AL, @ 


with the A; symmetric idempotent matrices that are pairwise orthogonal 
and whose ranks sum to the rank of 7. Idempotent matrices are singular 
matrices such that A = A’; they may be viewed as being generated by 
incomplete orthogonal matrices, i.e., sets of orthogonal columns. Thus (1) 
implies an orthogonal transformation. The most familiar example of this is 


the decomposition 


У = п + Gy, 
28 
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where the summation is over n measures. This might be written 
XIX’ = ХА,Х' + ХА,Х! (2) 


with X designating a single row vector of data and the A, and A, properly 
defined symmetric idempotent matrices. In this case the transformation is 
accomplished by an orthogonal matrix with each element of the first column 
consisting of the positive reciprocal of the square root of n. The matrix A, 
therefore is square, of order n, with each element the positive reciprocal of 
n. А» is square and symmetric, with each diagonal element equal to (n — 1)/n, 
and each off-diagonal element equal to the negative reciprocal of n. Aitken 
(1) demonstrates the independence of these forms for samples of normally 
distributed variables. The analysis of variance for a single variable implies 
the decomposition of the matrix A, in (2) into two or more pairwise orthogonal 
idempotent matrices. For the simplest design, A, is separated into two parts, 
one associated with the notion of “variance between” and the other with the 
notion of "variance within." The further separation of the matrix associated 
with "variance between" implies more complicated designs, such as a factorial 
arrangement of groups. Aitken's paper also gives necessary and sufficient 
conditions for the independence of two quadratic forms. 

Since the symmetric idempotent matrices detailed in (1) are pairwise 
orthogonal, ie, А.А; = 4,4, = 0, it follows that for any matrix X, 
(ХА) (XA,)’ = (ХА) (XA,)' = 0. This is a function of the matrices of the 
quadratic forms, and does not imply a particular distribution for the popu- 
lation from which X is drawn. Cochran’s theorem shows that the sampling 
distribution of terms such as those on the right of (2) is known if X is a 
random sample from a normal population (univariate case). Bartlett (2) has 
discussed tests of significance for a decomposition of the form 


X = ХА, + X4;, (3) 


where X is a sample from a multi-variate normal population with mean zero. 
Here, A; and A, are two parts of the matrix A, as it was defined for equation 
(2). Two points have been emphasized. One is that choosing symmetric 
idempotent matrices that are pairwise orthogonal as the matrices of the 
quadratic forms gives a decomposition, as in (3), for which (XA,) (ХА, = 
(XA;) (XA,)’ = 0. Second, under certain assumptions regarding the nature 
of the data, sampling distributions of statistics derived from the parts given 
by such a decomposition are known. The remainder of the paper will consist 
of a discussion of the principle given by this first point in relation to factor 
analysis. 

The following geometric representation of factor analysis is well known. 
For convenience, it will be assumed that the data have been scaled to unit 
variance; this may be a critical assumption from the statistical point of view, 
as Rao (8) shows, and the making of it emphasizes the attempt in this paper 
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to describe factor analysis rather generally and not to treat the complicated 
inferential problems. It is possible to regard the n persons as defining a 
space within which are located the k tests. The person axes are assumed to 
constitute a rectangular Cartesian system. Then the n measures for a given 
variable, when put in deviation form and scaled to unit variance, are the 
coordinates of a point in this person space that, when joined to the origin, 
defines the variable or test as a vector. Any factor may also be viewed as a 
unit-length vector located in this person space; such a factor is uniquely 
located by a set of n coordinates defining its end-point or by a set of direction 
cosines with respect to the n person axes. Define Z as a k X n matrix of data, 
such that ZZ’ = R, . The matrix of intercorrelations with units in the 
diagonals is designated by №, and Z’ is the conventional transpose of Z. 
Let y be a column of direction cosines locating a single factor in the person 
Space. Then Zy is the column of k scalar products of variables with this 
factor; these are correlation coefficients here, and may be regarded as a column 
of the factor matrix. Finally, Z(yy’) gives the coordinates, with respect, to 
the person axes, of the perpendicular projections onto the factor axis of the 
points representing the variables. This expression Z(yy') also is the portion 
of the data, Z, that is accounted for by the first factor. In other words, 


Z = #(уу) + 21 — уу) @ 


describes the separation of 7 into two parts, one of which is associated with 
the factor that is located in the person space by the column of direction 
cosines, y, and the other part a remainder. í | 

Equation (4) necessarily represents a separation of 2 inio two orthogonal 
parts. This is true because yy’ is idempotent, ane YY YY yy’, and con- 
sequently I — yy’ also is idempotent; therefore уу (I — уу) = (I = yy Jy 
= 0. It also is true that the matrix Z(yy') is the least-squares approximation 
of the row y’ to the rows of Z. This follows from least-squares theory; for a 
summary of the role of symmetrical idempotent matrices in € Linh 
Squares approximations see Harris (6). In general, then, the mre аш * 
Y, i.e., the direction cosines of а single factor, provides a separation of the 
data Z into two orthogonal parts, one of which is the E ex 
sion, i у sad Беара T an ee 
desi or axis in the person space; 1n 0 \ Е c 
o ت‎ the indefinitely many possible unit-length vectors in the 


Derson space. к " 
Equation (4) might be written more genera ly as 
Z = ZA + 21 — A), (5) 


ы Le 
Where A designates a symmetrical idempotent matrix, i.e., A = А*. Every 


Symmetrical idempotent matrix may be viewed as the product YY’, where Y 


7 7 
is a set of orthogonal columns, 1.е., an incomplete orthogonal matrix. (If Y 
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is the complete orthogonal matrix, A = J, of course.) Equation (5) then is 
the case of selecting one or more mutally orthogonal unit-length axes in the 
person space as a set of factors; the direction cosines of these factors are given 
by Y. As before, the two parts on the right of (5) are uncorrelated and ZA 
is the least-squares approximation of Y' to Z. The matrix Y’ is, of course, 
also regarded as the set of uncorrelated factor scores, each with unit variance. 
Again it should be emphasized that Y is arbitrary and might designate any 
set of orthogonal axes in the person space. 

So far, then, it has been shown that the specification of one or more 
factors leads to the separation of Z into two orthogonal parts, one of which 
is a particular least-squares approximation. 

The final step is to consider two approaches to factor analysis that differ 
primarily in the way in which the factor space is defined. The nature of these 
two approaches can be illustrated by considering the correlation between 
two variables. If the two variables are viewed as two unit-length vectors 
located in the person space, then the variable space is (at most) a plane, i.e., 
of dimension two. It is possible to define the factor space as identical with the 
variable space; this definition corresponds to choosing to factor the unit 
variances and the intercorrelations of the variables. If the complete inter- 
correlation matrix is non-singular, two factors may be extracted by this 
procedure. They would, necessarily, define the variable space. If only one 
factor is extracted, it would be represented by a line embedded in this variable 
space. Spearman’s approach to this problem differs. His approach defines the 
common-factor space as the line formed by the intersection of two orthogonal 
planes, in each of which lies one of the unit-length vectors. The uncorrelated 
unique factors are defined by lines perpendicular to this single common- 
factor axis and, of course, also lie in these two intersecting planes, For two 
variables that are not correlated perfectly, 1.е., are not collinear, the Spearman 
approach necessarily defines the common-factor space as distinct from the 
variable space. This definition corresponds to choosing to factor communali- 
ties and intercorrelations, rather than the correlation matrix with units in 
the diagonals. 

The first approach, i.e., factoring R, , requires that any factor axis be 
embedded in the variable Space; as a result, the factor might be located by 
reference to the k axes of the Cartesian system provided by the test vectors, 
as well as by reference to the n person axes. Obviously, the test vectors need 
not form a rectangular reference system. This means, then, that for any such 


scores. Holzinger (7) gives illustrations of this principle, using both the 
centroid and what has since become known as the multiple-group methods 
of factor analysis. Using this approach, it may be pertinent to determine & 
“best” location of a factor. Eckart and Young (5) have shown the nature of 


CHESTER W. HARRIS 27 


the best approximation, in а least-squares sense, of a matrix of data, Z, by 
another matrix of specified lower rank. Securing this best approximation 
is equivalent to identifying the first r principal-axis factors of R, , where 
r is the specified rank that is lower than the rank of Z. Defining the total 
factor space as identical with the total test space and then extracting r 
principal-axis factors from R, gives a separation of Z into the two parts of 
(5) such that (Т — A)Z' has a minimum trace compared with its trace for 
any other definition of A. That A is well-defined is evident from noting that 
A is generated by the r unit-length characteristic vectors of Z’Z that corre- 
spond to the r largest characteristic roots of ZZ’, which necessarily are the 
same as those of Z’Z. The Eckart and Young results therefore show that 
their choice of У gives a matrix ZA which is not only the least-squares approxi- 
mation of Y’ to Z, as it must be when A is generated by Y, but also a best 
approximation to Z. 

Finally, it is evident that the communality principle in factor analysis 
also postulates an equation of the form of (5), since the common factors are 
defined by some set of direction cosines, Y. However, the common factors 
are not embedded in the variable space and consequently the elements of Y 
cannot be calculated from the data, Z. Thomson (9, p. 78) comments on this 
point. This means, then, that when equation (5) is used to describe the com- 
munality principle in factor analysis it must be regarded as a formal equation 
with A = YY’ unknown. If A were known, then a principal-axis resolution 
of ZA into factors and factor scores, 

ZA-FS, | (6) 


would lead to the definition of A as SS’. This would follow from noting that 
SS’ is a unit for multiplication on the right of ZA that is of the same rank 
às A and recalling that à multiplication unit is unique within a group of 
singular matrices. However, this definition is circular, in that S, the factor 
matrix of factor scores, necessarily is identical with Y; the unknown A 
remains unknown. | | 
This discussion has emphasized the connection between factor analysis 
and well-known procedures for separating data into two or more parts. 
Following Cochran and Cramér, the separation of data into orthogonal parts 
Was formulated in terms of symmetric idempotent matrices as the ee 
of the quadratic forms. It was then shown that from the geometric view 
of factor analysis the specification of one or more factors 1s the specification 
of one or more sets of direction cosines that generate a symmetric idempotent 
Matrix and that this matrix, A, and Из annihilator, (Г — A), achieve a 
separation of the data. The nature of the matrix A was examined for two 
ifferent approaches to factor analysis. For the first approach, Eckart and 
Young’s results were reviewed to show that a minimum trace of Z(I — 4) 
is achieved by the principal-axis factoring of В, . For the communality 
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approach, the merely formal character of 4 was emphasized. Although 
problems of estimation and statistical inference were not considered in this 
paper, this final result lends support to the belief that the communality 
principle poses important problems of statistical estimation. 


REFERENCES 


1. Aitken, A. C. On the independence of linear and quadratic forms in samples of normally 
distributed variables. Proc. royal Soc. Edinburgh, 1939, 60, 40-46. 

2. Bartlett, M. S. Multivariate analysis. J. royal stat. Soc. Sup, 1947, 9, 176-90. 

3. Cochran, W. С. The distribution of quadratic forms in a normal system with applica- 
tions to the analysis of variance. Proc. Cambridge phil. Soc., 1934, 30, 178-91. 

4. Cramér, Harald. Mathematical methods of statistics. Princeton, N.J.: 
Press, 1946. 

5. Eckart, Carl, and Young, Gale. The approximation of one matrix by another of lower 
rank. Psychometrika, 1936, 1, 211-18. 

6. Harris, Chester W. The symmetrical idempotent matrix in factor analysis. J. exp. 
Educ., 1951, 19, 239-46. 

7. Holzinger, Karl J. Factoring test scores and im 
Psychometrika, 1944, 9, 155-67. 

8. Rao, C. R. Estimation and tests of significance in factor 

9. Thomson, Godfrey. The factorial anal 
Co., 1950. 4th edition. 


Princeton Univ. 


plications for the method of averages. 


analysis. (mimeographed). 
lysis of human ability. Boston: Houghton Mifflin 


Manuscript received 1/25/54 


Revised manuscript received 4/9/54 


PSYCHOMETRIKA—YVOL. 20, No. 1 
MARCH, 1955 


THE CHOICE OF AN ERROR TERM IN 
ANALYSIS OF VARIANCE DESIGNS* 
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` This article presents a survey of the assumptions which may be made 
in variance designs, a description of the mathematical models which reflect 
these assumptions, and a discussion of the ways in which various experimental 
conditions affect the choice of an error mean square. Particular emphasis is 
laid upon the principles, purposes, and dangers of pooling error mean squares 
in order to raise the power of a test. Specific recommendations are made for 


the rules of procedure for pooling (under various conditions) which produce 
tests with optimum power and error characteristics. 


Among the various treatments of psychological statistics one finds a 
good deal of confusion and discrepancy in the recommended procedures for 
selecting an error term in the analysis of variance (see 4, 5, 7, as examples). 
In all too many cases the obtained significance or insignificance of the ex- 
perimental results depends as much upon the particular statistics text used 
as upon the sampling data. The aim of this paper is to show the possible 
assumptions which may be made in regard to analysis of variance data, some 
of the hypotheses which may be tested, and how these and other factors 
influence the choice of the error term. Because of space limitations, the 
arguments will be restricted to a two-factor (or double classification) arrange- 
ment with m replications per cell. Many of the arguments presented here are 
directly translatable into the more complex designs. / . 

Unfortunately, the derivations of the proper terms for testing various 
hypotheses under the conditions specified by the assumptions require a good 


deal of mathematical sophistication for their understanding. While references 
made to the sources in which the proofs may be found, 


dus iral ues | ed in rigorous 


this paper is aimed principally at the reader who is less interest ‹ 
mathematical analysis than in the uses of the material in research design. 
For the purpose of identifying the various potential groups of assumptions 
and demonstrating the proper statistics under these assumptions, we shall 
make use of three mathematical models: linear hypothesis model, components 
of variance model, and mixed model. 

*The writer is indebted to Professors Quinn McNemar and Lincoln Moses of Stanford 


University i h nuscript and offering many helpful suggestions and criticisms. 
518 aE E Z. W. Birnbaum of the University of Washington for preliminary 
Suggesti, tation. — | 
5 е а оона азаар this paper was completed while the author was at Stanford 
University and the Veterans Administration Hospital, Palo Alto. 
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The data and definitions of Table 1 will be used throughout the paper. 
In the table, X;; is the observation in the ЯВ row ( = 1, --- , r) and the 
jth' column (j = 1, --- , c) and for the kth replication (k — 1, --- , m); and 
X dk 
1 


X;;. = Observed cell mean = E 


2; > Xin 


7-1 


X;.. = Observed row mean = ++ 

cm 

E ж 

D УХ, 
X.;. = Observed column mean = —— $7 

кож ош 

DESEE ON 
X... = Observed over-all mean = 1 АЕ А . 


rcm 


Before dealing with the differences among the various models, let us 
consider the meaning of row, column, and interaction "effects." Each factor 
(or classification) is a characteristic or variable (such as individuals, con- 
ditions, tests, or treatments) which includes a number of different specific 
elements. In our case there are r elements in the data for the row factor and 
c elements in the data for the column factor. It is assumed in the analysis of 
variance that the value of each Х,,, observation is derived from two con- 
tributing sources: one dependent upon the particular row and column elements 
to which the particular unit belongs, the other independent of these elements. 
The first of the two contributing sources includes the row, column, and 
interaction effects; the second includes errors of observation and an over-all 
value constant for all of the observations in the data. Thus, the row effect 
is the magnitude of the contribution of a particular row element to the 
observed values of all units which it encompasses; the column effect is the 
magnitude of the contribution of a particular column element to the observed 
values of all units which it encompasses; and the interaction effect is the 
magnitude of the contribution due to the coming together of a particular 
row element with a partieular column element. 

The effects, the over-all value, and the errors of observation are assumed 
to be independent; their sum determines the various observed values. Since 
the mean of the errors of observation is assumed to be zero, and since the 
values within any one cell are assumed to vary only as a result of these 
measurement errors, the average value of X,,, over a great number of repli- 
cations within any one cell would be expected to be equal to the sum of its 
row, column, and interaction effects plus the over-all value. (In this paper 
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the over-all value in the case of the components of variance model is set 
equal to zero.) 

For the most part, the above discussion applies only to the pure case in 
which replications are actually exact repetitions of each of the re conditions. 
In many cases it is not feasible to have these exact replications within each 
cell because of effects of such factors as learning and motivation. But in 
using different (although comparable) units within the various cells, one 
introduces sampling errors in addition to the measurement errors. These 
sampling errors, however, can be made self-compensating, in the sense that 
they make an equal contribution (on the average) to all of the mean squares, 
by appropriate sampling and design. The examples in the following sections 
are illustrative of the ways in which sampling errors may be made self- 
compensating. If the sampling errors are handled in this way, the analysis 
is reducible to the general model established above. 

As an illustration, let us take an experiment in which the row factor 
consists of individuals (the row elements being the specific individuals) and 
the column factor consists of a series of tests of visual acuity (the column 
elements being the specific tests). Let us assume an observed value or score 
of 30 for the second replication in the cell intersected by the third row and 
the first column (iie, X4,, = 30). This observation or score is made up of à 
number of components which sum to produce the specific value. We assume 
for the sake of exposition that the component contribution uniform for all 
observations in the third row is equal to 11 (i.e., the third row effect is equal 
to 11), that the component contribution of the visual acuity test in the first 
column for all of the observations which it encompasses is equal to 8 (i.e. 
the first column effect is equal to 8), that the component contribution of the 
unique interaction between the individual in the third row and the test in 
the first column for all of the observations in the (r — 3, c — 1) cell is equal 
to 3 [i.e., the (r = 3, с = 1) cell interaction effect is equal to 3], that the 
over-all value is equal to 6, and that the error involved in the second replica- 
tion in the (r — 3, c — 1) cell is equal to 2. Thus, the assumption of the 
analysis of variance implies that the individual in the third row taking the 
test in the first column for the second replication will obtain the specified 
observed value or score of 11 + 8 +3+6-2 = 30. 

The essential difference between the linear hypothesis model and the 
components of variance model is that the main effects of the former are 
fixed and constant whereas the main effects of the latter are random variables. 
All other differences result from the different mathematical treatments 
necessitated by this distinction. In order to have fixed and constant effects it 
is necessary that the elements of each factor be unique and not determined 
by random sampling; in order to have random effects the elements of the 
factors must be selected by simple random sampling from a larger population- 
Thus, if the entire population of elements is included in a particular factor: 
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its effects are fixed; if a random sampling of elements from some larger 
population is included in a particular factor, its effects are random. Examples 
of these kinds of effects will be presented later. The mixed model has one 
factor with fixed effects and the other factor with random effects. 

When one makes a statistical test on the row effects he is testing an 
hypothesis of the type: Among the elements of the row factor (whether these 
be fixed or random) there is no variation in the magnitude of the contribution 
to the obtained observations. In other words: All the row effects are equal. 
Similarly for the column and interaction effects. 


I. Linear Hypothesis Model 


For purposes of exposition it will be convenient to divide the linear 
hypothesis model into three cases: (a) no a priori assumption as to inter- 
action is made; (b) the a priori assumption is made that the interaction 
effects are equal, but no preliminary test of this assumption is desired; and 
(c) a preliminary test of the assumption of no interaction is made. 


Case (a): Linear hypothesis model; no interaction assumption. 


We assume 
Xin S ua. d- ue. Ри. tat en; @ 
where 
шау. = fixed interaction effect, (i = 1,2) Gi 1, = 06 
Hi = fixed row effect, ($ = 1, .-- 707 
Hi. = fixed column effect, (j = у She. „@); 
и = over-all value (most commonly called the general mean). 


That is, we assume that a particular observation is determined by the sum of 
the over-all mean, the effect of the row of which it is a member, the effect 
of the column of which it is a member, the effect of the row by column inter- 
action for the cell of which it is a member, and an error term. 

We further assume that the ej, are independent random variables, 
normally distributed with mean equal to zero and variance equal to o? 


(unknown). А 
We also have the assumptions 


Уш. = 0, УХ». = 0, » и. = 0, i^ = 0. (2) 
vel XE j-1 j=l 


These latter restrictions do not involve any loss of generality; if the effects 
which sum to make up Х;;, do not meet these assumptions, new values, 
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which meet these restrictions as well as the additive assumption, may be 
derived linearly from the original ones. That is, suppose we assume only that 


Xin = и ш. + н ш F Eijk у 
we may then derive 


i <a i 1< 
= s cdm A nb 2 aye Hy (3) 
i-i 7-1 11 € ja 
à и с 1 те 1 т 
Hi. = Уа. + ew = У Ува — = Dae, (4) 
Semi i=l jel T ic 
iem Suta > Sk ; (5) 
i т gg i i re £M fg ii c £4 Mei, 
1 < 1 < 1 х= җ- 
ipo = ди. — = dec ije SS TE 6 
Mi. = uc 2 Dun ; Ape de: D ugs, (6) 


Pel jel 


so that the derived values satisfy all of the assumptions and may ba called 
the “effects.” 

(Note that we make no assumptions as to the existence or vanishing of 
the interaction effects for this case: ie., the и,;. may be equal to 0 or to 
K:;. , where K,;. я 0, for all i, j.) 

An example of this model would be an experiment in which с types of 
psychotherapy (perhaps directive versus nondirective or interpretive versus 
suggestive versus reflective) are employed by all г psychotherapists of а 
particular clinic in an attempt to have various subjects recall certain repressed 
material. Under each of the re conditions there are m subjects; each subject 
is used under only one set of conditions (making the total number of subjects 
rem). The recorded score in each case would be the time required for the 
particular therapist, with the particular therapeutic technique and with the 
particular subject, to get this subject to recall spontaneously certain (con- 
trolled) repressed material. Notice that we can generalize our results по 
further than to the therapists in this particular clinic since we considered 
the therapists not as a random sample from some larger population but 
rather as being fixed and distinct. The r therapists constitute our row factor 


population. (This distinetion will be clearer after the discussion relative to 
the components of variance model is re: 


to generalize his results to all psychoti i 


hniques, he would have had to select 
the entire population of therapists 1n 
table to use all of the psychotherapists 
only if the assumption may be made 
t a random sample of all the therapists 
be generalized. This case would con- 


the area under consideration. It is accep 
in one particular clinic in this latter cas 
that the therapists in the clinic represen 
in the area to which the results wil] 
stitute a mixed model. 
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The likelihood-ratio test (which is closely related to maximum likelihood 
estimation) is widely used for testing hypotheses in statistics since this test 
has many optimum properties. To test the hypothesis that there is no differ- 
ence or variation among the row effects, or, what is equivalent, that there are 


no row effects (all u;.. = 0), the likelihood-ratio test leads to the ratio (see 
Table 1) 
I (7) 


which is distributed as F with (r — 1) and re(m — 1) degrees of freedom 
under the null hypothesis. [For proof see (8), pp. 59-60.] 


Case (b): Linear hypothesis model; assumption of no interaction, but without 
preliminary test. 


As in case (a), we assume 
Хи» = nag. t mi nag. Hat en, 


where єг are independent random variables, normally distributed with mean 
equal to zero and variance equal to о? (unknown). And also 


55 њи. = 0, D mis. = 0, D ne. = 0, 25 n4. = 0. 
This time, however, we make the additional assumption that there are no 
effects due to interaction; that is, uj. = 0 for alli = 1, +--+ , rand j = 1, 
- , €. (Note that this assumption makes the two assumptions above in 
regard to the summing of the interaction effects over 4 and j redundant.) 
In this case, the likelihood-ratio theory leads to the following quotient 
for testing the hypothesis that there are no row effects (or all u;.. = 0): 


[2 
as 


Fa’ " 
which is also distributed as F, but with (r — 1) and (r — 1) (c — 1) + 
rc(m — 1) degrees of freedom (if the null hypothesis is true). (For proof see 
6, pp. 220-224). 

Thus we see how the appropriate term (according to the theory of likeli- 
hood-ratio tests) to be used in testing the existence of the row effects (or 
column effects by similar reasoning) depends on the accepted assumptions. 
If an experimenter's data fit the assumptions behind the linear hypothesis 
analysis of variance model, and he makes no assumption as to the existence 
or non-existence of interaction, he uses (7); but if the experimenter can 
assume on the basis of some a priori reasoning that no interaction exists 
with data which fit the assumptions behind the linear hypothesis model, he 


uses (8). 
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Case (с): Linear hypothesis model; the use of a preliminary test. 


Now the question arises as to the advisability and acceptability of the 
procedure of testing the significance of the interaction term by means of 


А, (9) 
8 


(which will be referred to hereafter as the “preliminary test”) before deter- 
mining whether to use (7) or (8) for the final test. Thus, if the interaction 
term is significant when tested by (9), the within cells mean square is used 
as the error term; if the interaction term is not significant when tested by 
(9), the sum of the interaction and within cells sums of squares divided by 
their combined degrees of freedom is used. 

This is a compromise procedure which was originally derived on an 
intuitive basis by applied scientists in an attempt to utilize their experimental 
results and past knowledge to raise the power of their statistical test. [The 
power of a test is defined as one minus the probability of a type II error. See 
(10, pp. 246-248) for a good discussion of type I errors, type II errors, and 
power in the analysis of variance.] With this procedure the two tests (pre- 
liminary and final F) are not statistically independent, since they are both 
made on the same set of data. Thus, certain dangers are introduced. 

This lack of independence and mathematical neatness has led mathe- 
matical statisticians to shy away from this area of application until very 
recently (and indeed some apparently still condemn this whole process of 
making the preliminary and final tests on the same set of data). Since 1944, 
Bancroft (2), Mosteller (11), Paull (12), and Bechhofer (3) have made 
important contributions to th i Ч 


be referred to as "never pool" (involving no assumption as to the existence 
or non-existence of interacti 


| : t on with no preliminary test), “always pool” 
(involving the assumption of no interaction with no preliminary test), and 
“sometimes pool” (where the error term in the final F-test depends upon the 
„results obtained in a preliminary test of the significance of the interaction 
mean square). 

Before presenting the formal su 


| mmarization of the rules of procedure, 
certain symbols for degrees of freed 


cussion. Accordingly, let om will be defined to facilitate the dis- 
ъ= (1), 
т = (r— Ne- 1), 
ns = rc(m — 1), 
te Da + re(m — 1); 
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and let F(a, ; n, , n;) refer to the value which is exceeded by F with probability 
а, under the null hypothesis for the degrees of freedom n», (numerator x^) 
and n; (denominator x^); і.е. 


Pr {F > F(a, ут: ,nj)] = а. (10) 
(The subscript of o, that is, x, may be equal to 1, 2, or 3; the particular usage 
will be explained later.) 


For testing the row effects in the two-factor, m replications case the 
statistical procedure may be summarized as follows: 


“Never Pool” “Always Pool" 


Reject и... = 0, if Reject п... = 0, if 

СЕ з? 

gû > Flos ут , m). = > Flas ут , na). 
Accept ш;.. = 0 otherwise. Accept и;.. = 0 otherwise. 

“Sometimes Pool" 

Reject р... = 0, if 

si 8? 

52 > Flay ут , Ms) and gî > F(m;m,n); (1) 

or if 

si d з? | 

FE < Fla, уп» ,m;) an EN > F(o ут , m4). 
Accept u;.. = 0 otherwise. | 


Let us examine the advantages and disadvantages of each and the con- 
ditions under which each may be used. 

The “always pool” procedure (where the interaction effects are in fact 
non-existent) provides a uniformly more powerful F-test than the “never 
Pool” procedure for equivalent type I errors. [A uniformly most powerful 
test is one which is more powerful than all other possible tests (a test being 
defined by its critical region) regardless of the alternative to the null hypo- 
thesis which is assumed to be true.] If the “always pool" procedure is used 
and there actually are interaction effects, the denominator in the final F-test 
Will tend to be too large, and the test will give too many non-significant 
results when in fact the null hypothesis is not true. Increase in interaction 
effects increases this distortion without limit, so that the research worker 
May be working at the, say, 1/500 per cent level of significance although he 
thinks he is working at the 5 per cent level. [See Table 2, p. 74 in Bechhofer 


(3) for an indication of how bad this disturbance gets under various con- 
ditions.] 
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The "sometimes pool” procedure is an attempt to avoid errors of this 
sort; the preliminary test is expected to advise against pooling when the 
interaction is large. The “sometimes pool” procedure cannot be expected to 
eliminate this source of error (or disturbance) entirely. But it is useful if it 
keeps the type I error of the final F-test close to the level at which the in- 
vestigator thinks he is working. For equivalent type I errors this procedure 
also makes the power of the final F-test greater than the power of the final 
F-test under the “never pool" test. 

Tt will be convenient for further exposition to introduce a term which 
summarizes the over-all magnitude of the interaction effects. Let this be 


\ = т У; ИЕ : (12) 
i=l] j= 
^ equals zero only when the interaction effects are all equal to zero; it gets 
proportionately larger as the џ;;. deviate from Zero. 

When 2 is large, power and error characteristics make tho use of the 
"sometimes pool" test theoretically unjustified (3). It is ever 
to use the "always pool" test under these condition 
gator has no a priori evidence to indicate a 
the "never pool" test, the routine use of th 
being theoretically unacceptable. 

In those cases in which the experimenter has definite a priori reasons for 
the belief that А is equal, or at least close, to zero (that is, 
mately equal to Zero), and at the same time зу. 
protection from an inaccurate assumption, the us 
procedure can be justified and is advantageous. 

But more is inv. 


1 more precarious 
s. Thus, when an investi- 
particular value for à, he uses 
e "sometimes pool" procedure 


all u;;. approxi- 
ants a certain amount of 
€ of the “sometimes pool” 


olved than the mere caution that the use of the “ѕоте- 
є нн Е ва бл NP ae 
times pool procedure requires definite a priori evidence indicatir 
interaction. Since there is no preliminary test in the case of the “ 


and "always pool” tests, their power is completely determined once the 
significance level of the final F-test is selected 


value of А, and an assumed-as-true altern 
hypothesis). When the "sometimes pool" procedure is used, 
significance level for the final F- 


ng лего 
never pool” 


however, the 
test merely limits 


E bina: e test are established. 
very combination of preliminary and final Significance levels (for 
fixed degrees of freedom) within the genera] category of the “sometimes pool" 
procedure yields a different test. The “always pool” and “never pool” tests 
may simply be thought of as Special (or extreme) cases of the “sometimes 
pool” procedure, with pr nee level equal to zero and one, 
respectively. Accordingly, significance level of a “sometimes 


eliminary significa 
as the preliminary 


ОНИ 


ARNOLD BINDER 39 


pool” procedure is decreased, it approaches an "always pool" test; as the 
preliminary significance level is increased, the "sometimes pool" procedure 
approaches a “never pool" test. Although the power of the entire test is 
greater with smaller preliminary significance levels, these smaller levels 
provide less protection from the disturbance resulting from an error in judg- 
ment as to interaction (particularly at the intermediate values of А). Con- 
versely, although there is more protection from the potential disturbance 
in total test significance level with larger preliminary significance levels, 
there is less gain in power over the corresponding “never pool" test. 

[These relationships of power and total test type I error to the level of 
the preliminary test are not monotonic for all conditions. The best tests to 
be recommended in this paper have definite advantages in both power and 
error characteristics over many alternate tests. Nevertheless, the relation- 
ship indicated above does hold for wide and important (for protection 
purposes) ranges (a) in the magnitudes of the degrees of freedom, (b) in the 
selected final significance level, (c) in the value of №, and (d) in the possible 
alternative to the main effects’ null hypothesis.] 

Bechhofer (3), for this model, and Paull (12), for the components of 
variance model, have worked out compromise tests which involve minimum 
danger of erroneous conclusions over the widest possible (for a uniform 
procedure) ranges in the values of interaction, the various degrees of freedom, 
and the possible alternatives to the main effects’ null hypothesis. The pro- 
cedure involves the free selection of the significance level of the final F-test; 
the preliminary significance level is established (by appropriate rules) so 
as to provide a test with the most desirable characteristics for the fixed final 


significance level. . 23 
What follows in this section is directed toward elaborating the statistical 
rules for establishing, for a fixed final significance level, that preliminary 


significance level for F which leads to the specific “sometimes pool” test 
with the most favorable power characteristics and minimum disturbance in 
significance levels for the linear hypothesis model. 

Following Bechhofer (3), let 


@— 6 TD re jns sm аз) 
zi rc(m — 1) ]re in m | 
aD [re "em z 

= К 1) F(a, m ,т) 
(r — 1) ee 4 
с = l; = 1) = 1) + re(m — 5] sjm M) (18) 


om a is completely determined by a , b by a, and 
f significance to be used for the preliminary test. 
nce for the final F-test which the experimenter 


For fixed degrees of freed 
€ by os. a, is the level о 
а, is the level of significa 
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would use if the preliminary test advises against pooling. a, is the level of 
significance for the final F-test, which the experimenter would use if the 
preliminary test recommends pooling. Notice that o, defines the “never 
pool” procedure when a, = 1 and that оз defines the “always pool" pro- 
cedure when o; — 0. The above conditions define a “sometimes pool" test 
whenever 0 < o, « 1, which is the situation that interests us at the moment. 

Within the category of “sometimes pool" tests we make three distinctions 
(3, p. 26): 


| b 
class A tests when e acq 
borderline tests when Ci 2 т, (16) 
class В tests when с < Е - 
a+1 


The particular “sometimes pool” test is thus automatically determined once 
the significance levels (a: , а, , and оз) are selected. ` 

. The foregoing exhaust all of the Possible relationships which may exist 
between с and b/(a + 1). These values are under the control of the experi- 
menter in the sense that he is free to choose the a-levels of significance; the 
latter uniquely (for fixed degrees of freedom) determine a, b, and c. The usual 


the preliminary test, Suppose also, in this example, that r = 4, c = 5. and 
m = 8. The F-value at the one per cent level for n, = (т — 1) 6 = 1 = 12 
апа My = rc(m — 1) = 4018 2.66; forn, = (r — 1) = 3, andn, = re(m — 1) "s 
40, this level is 4.31; for n, = (т — 1) = 3, and Ns = — 1) (ce — 1) + 
re(m — 1) = 52, this level is 4.18. This gives 


_ Гош 

а = ee Jess = .798, 
T ® 

ae logals = 323, 


= (3) 
с аја = .241. 
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Since 
323 ( А b ) 
241 > 798 + 1 that is, c > TI) 


this procedure amounted to a class A test. But, as Bechhofer (3, see par- 
ticularly Tables 2, 3, 4, 5A, 5B) has shown, class A tests do not have the 
most favorable combination of power and error characteristics for this model. 
Thus, without definite knowledge or awareness, the experimenter selected 
a generally inferior test by employing this rather widely used procedure. 

After a very thorough evaluation of the power and error characteristics 
of the various types of “sometimes pool” tests, Bechhofer concluded that the 
borderline test was the over-all best bet in terms of relative assurance of 
freedom from erroneous experimental conclusions. The borderline test does, 
however, introduce a slight distortion in the whole test type I error when 
^ = 0. That is, if o; is the type I error of the “never pool” test which the 
experimenter ordinarily would use, the borderline test defined by b (оз) and 
c(a2) has the property that its type I error will be larger than a» , when 
\ = 0. 
To illustrate how much larger this type I error gets let us consider one of 
Bechhofer’s examples (3, р. 81). For az = аз = 0.05 and form = 2, п, = 2, 
and п. = 6, the maximum type I error of the borderline test (that is, when 
^ = 0) is 0.0653. This distortion is just about the worst that can be en- 
countered in the two-factor, m replications design since the type I error 
decreases with either increasing А or with increasing пз (regardless of А) and 
approaches the limiting value of o;(— аз) very rapidly. . i 

Thus, as Bechhofer concludes, “There is strong justification for the use 
of the borderline test under the circumstances specified. By tolerating a 
small increase in size [type I error] the experimenter can achieve a emo 
large gain in power «+: [when à = 0]. He is protected against large bi ( ; 

* since the power never will drop below the power of the conventiona 

‘never pool’ test he ordinarily would use." (3, p. 112). The absolute gain m 
Power of the borderline test over the “never m E is a function of the 
degrees of freedom, and is greatest for smaller values of ns . / 

In UNS to the advantage in gain of power of the borderline b x 
Saw in the preceding paragraph that its use brings freedom ee m ч x 
&ross disturbances in type I error discussed Е WO m js п Dm 

The recommended procedure, then, where the exp es 
а priori lieving that the interaction effect is zero (or very clos 
to лег pecu La ae to have some protection from the catastrophic 


effects of a completely erroneous assumption, is as follows: | 
1. Establish the level of significance (a) for the final F-test which would ordinarily 
be used for a “never pool" test, and set аз = аз 


= а. 
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2. Determine b from 


x (2 res im , Ns); Where Oa: = а. 
3. Determine с from 
E 6—1) | T ће» e di 
с = l; a DED D Flas ;т , na), where Qs 


4. Determine а from 


eat — hebes test.) 
[4 
5. Since a is determined, the F-value for the preliminary test, which gives the most 
effective "sometimes pool" test, may be found by 
re(m — 1) 
:7 = | ea Ife 
Fes ine ym) = [ je 2s Jo 


6. Now that the three F-values are established 
procedure defined previously 


the type I error will be slightl 


» We can proceed with the rule of 
for the “sometimes pool" test, with the understanding that 
у larger than anticipated. 


; instead of using the one per cent level 
the preliminary 
as follows (again r = 4, € = 5, m = 3): 
1. The one per cen: 


t level of significance wil 
outcome of the prelimina: 


1 be used for the final test regardless of 
ту test. 


3 
Lada [аз = 323 
Where 4.31 is the F-value at the 1 


2. 


per cent level of Significance for т 


= 3 and m = 40. 
8. = Г. a aa = .241 
(34) + (@(5)(2) ` 
Where 4.18 is the F-value at the 1% level of significance for nı = 3 and ny = 52. 


.323 
4, 1 о, 
A 241 T 1 = 340 


5. F(a, ; 12, 40) = 19 (340) - 1.133 
6. Now we reject u;.. = 0 if 
s 2 
121198 аш = > 431 
or if 
st 11 s 
22 < 1133 аы a > 418; 


and accept џ;.. = 0 otherwise. 
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IT "Components of Variance Model 


Again X;; is the observation in the ith row (î = 1, --- , r), the jth 
column (j = 1, --- , с), and for the kth replication (k = 1, --- ‚ т). In this 
model the row effects, the column effects, and the interaction effects are all 
assumed to be random variables. And 


Xin = U, + Vit Wij + Zin 
G= 6939); (9 = 149; (Е =1,-++ , т). (17) 


(Notice that there is no symbol representing the over-all value above, 
since it is equal to zero for this model.) 


U; = the ith row effect, 
V; = the jth column effect, 


Wi; 


the ith row by jth column interaction effect, 


Ж = an error term. 


[Roman letters represent the effects in this model; the Greek letter u (n 
various forms) represents the effects in the linear hypothesis model. This 
notation is in accord with the practice of mathematical statisticians to use 
Greek letters for parameters (population values) and Roman letters for 
random variables. In the components of variance model the variances involved 
are the parameters.] : 

We further assume all U; , V; , Ws; , and Z,; are independent and 
normally distributed, with the following population values: 


U; & в 
y; E с: 
W, x Е, с? (18) 
fas & e 


(E— & -& +h EJ 


If we convert these to the values 


le =t (19) 
вый 5. (20) 
Ty = Wa ~ әз QD 

(22) 


Qi = Янь 8, 
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the new terms are independent and normally distributed with the population 
values 


Mean Variance 

R; 0 05 = 92 

8; 0 © = о? 
Ta 0 a, = 105 (28) 

О 0 a=? 
Xin ЕЕ, + В, Т + 0 (24) 
=a + o о + چە‎ = о оо. (25) 


As stated previously, the components of variance model differs from the 
linear hypothesis model in that the row and column effects are considered to 
be random variables, not fixed constants. That is, the individuals, tests, or 
situations which constitute the two factors in the components of variance 
model are randomly selected from two larger groups. 

An example of this would be the following study to test the hypothesis 
that among all of the schools in a particular county there is a real difference 
in the average reading ability (within each school) of the third graders. For 
the study, r schools are selected at random from all of the schools in the 
county, ¢ books are selected at random from all of the third-grade readers 
used in the county, and ст third-grade students are randomly chosen from 
each of the r schools. The students are each asked to read aloud 500 words 
from one of the readers; their average reading speeds are the recorded values. 
(The passages from the books are, incidentally, also chosen at random). 
There are, thus, m observations in each cell representing the reading scores 
of m different children from the school in the АВ row, each reading the book 
in the jth column. It is assumed that the schools have what may be called 


“contribution to reading speed” values for third graders of U, а, , Ur, 
which constitute a sample of r 


Yi, g И. ; the latter contri- 
а normal distribution, drawn inde- 


Schools had been chosen, the values 
О, , U,, --- , U, would have been different, and if c other third-grade readers 


had been chosen the V; , V, , ... , y, would have been different. [We are 
not interested in any “basic reading capacity" which the children may have 
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Thus, the results of this study may be generalized to the population as a 
whole (all the schools in the particular county in the case of the row effects— 
the test of the above hypothesis—and the entire population of third-grade 
readers used in the county in the case of the column effects). 

Notice that we referred to the U; , V; , and W;; as the row, column, and 
interaction effects, respectively. As a result of this, the testing of the row 
effects (for example) for significance in this model amounts to a test of an 
hypothesis of the sort: In the population as a whole, from which the sample 
was drawn, there is no difference in the row effects among the individual 
elements. In the above example this means we test the hypothesis that there 
is no difference in (or variation among) the “contribution to reading speed" 
values of the various schools (the row effects) and/or that there is no differ- 
ence in (or variation among) the “ease of reading” contributing values of the 
books (the column effects) in the two populations. Thus, with this model the 
null hypothesis that the row effects do not vary in the population takes the 
form c? (ог o) = 0. (c? = 0: = ОШ the case of the columns). | , 

If we had called the R; , S; , and Ту; the row, column, and interaction 
effects, respectively, the type of hypothesis we test and the preliminary 
assumption would be analogous to that of the linear hypothesis model. 
First, we would test an hypothesis such as that there are no row effects, rather 
than that there is no row effect difference or variation. This follows from the 
fact that the R; (which have means equal to zero) are all identically zero and 
thus non-existent when o? = 0, since their distribution is concentrated at 
the point zero. Also, our preliminary working equation for this model, showing 
the additive composition of the observed values, would have included a temm, 
representing an over-all value as in the linear hypothesis case. ihis aie 
Would be £ , and, ази previously, it would represent an over-all or e 
mean. Referring to ће U; , V; » and Wı; as the effects is consistent w | 
common practice and emphasizes the distinction between the effects i. 
linear hypothesis model and those of the components of variance model, as 
Well as the treatment differences necessitated by this distinction. | T 

As was the case with the linear hypothesis model, the ч E: the 
proper term for testing the null hypothesis (in this case ¢ t vns ү ү 
the assumptions made relative to the interaction effects. I e is e Ss 
Makes no assumptions as to the equivalence of the interaction e 


tests the hypothesis o? = 0 by 


umerator being too large 
p. 345-346.) This is the 
an square is the proper 
the within cells mean 


which is distributed as F only when д = 0, then 
otherwise. (For a delineation of the proof see 10, р 
“never pool” procedure. Note that the interaction me 
error term for this model under these conditions; 
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square used as the error term in the linear hypothesis model is not the correct 
error term here. 

If the investigator has ample reason to make the assumption that the 
interaction effects are identical (that is, o? = 0), he may use the “always 
pool” procedure and test the hypothesis с? = 0 by 


82 


Situ 
[For the essential features of the proof see (10), pp. 345-346.] This is the 
same test as used in the “always pool” procedure for the linear hypothesis 
model. As in the case of the other model, too, this procedure provides a 
uniformly more powerful test when the assumption is true. 

Again there is motivation for the use of a preliminary test of significance 
by reason of doubt as to the validity of the assumption concerning the inter- 
action effects. Contrary to the linear hypothesis model, the use of the “always 
pool” procedure, when there is in fact an interaction effect variation, results 
in the final F-test denominator tending to be too small, with the test giving 
too many significant results when the null hypothesis is true. With increase 
in interaction effect variations, this disturbance increases without limit as 
before, so that the experimenter may think he is operating at the 5 per cent 
ас of significance while he may actually be operating at the, say, 37 per cent 
evel. 

The same preliminary test is used as for the preceding model, namely, 

2 


с 
вы 


The rule of procedure for this model may be summarized as follows: 
“Never pool” 


“Always pool" 
Reject o? = 0, if a 


Reject o? = 0, if 
8 < Wo, 5 s 
2 2 (aa ;т ‚ na). 3 — > Flas уп y na). 


Sit 


ae Р 
Accept т; = 0 otherwise. Accept c? = 0 otherwise. 


? f “Sometimes Pool” 
Reject c? = 0, if 


2 
EH 2 
— > Fa, ут $ 
b» P 
82 (o, уп» , m) and gî Z Ре jm , n2); 
or if 

2 
Si 
=. < Fla, ;n; pna) and Sr 

w 


Ри > Flas ут ina): 
Accept с? = 0 otherwise. 
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i ge has found that a class A test has the most desirable properties 
: s mathematical model. This class A test, which is described below, 
is recommended “as one which tends to stabilize the disturbances at intor 
meiate values of [the ratio of the expected value of the interaction mean 
square to the expected value of the within cells mean square] while still 
taking advantage of a considerable portion of the possible gain in power at 
values of [this ratio] near опе” (12, р. 544). When this ratio, which is analagous 
to the А of the linear hypothesis model, is large, there is little disturbance 
with the “sometimes pool” tests, since pooling is almost never advised. 
Paull recommends this procedure as the best compromise between the lower 
preliminary F significance levels (5 per cent, etc.), which do little to counter- 
act the possible disturbance in the type Т error of the final F-test, and the 
higher preliminary F significance levels (70 per cent, etc.) which provide 
little increase in power over the "never pool" procedure. 
As a matter of fact, the effective or total test significance level when 
a: = a, = 0.05 and the preliminary F-test is made at the same level (a, = 
0.05) is considerably above 10 per cent for wide ranges in the value of the 
ratio of the expected value of the interaction mean square to the expected 
value of the within cells mean square. 
ts of pooling the interaction 


The recommended class A procedure consis 
and within cells mean squares when their ratio is less than 2Е(.50; n; , та); 


that is, accept c; = 0, if 


2 
85 < 2FC50; п» Ma)» 
al F-test using s/s} if т = 0 is not accepted, 
а. [Fifty per cent points for the F-distribution 
piled by Merrington and Thompson (9).] 


dure with fictitious data like that used to 


show the linear hypothesis “sometimes pool” procedure (one per cent level 
of significance for the final test with r = 4,¢ = 5, and m = 8). The F-value 
for the 50 per cent level of significance with n; = (r — 1) (c = 1) = 12 and 
Ns = re(m — 1) = 40 is equal to .961; the F-value for the one per cent level 
of significance with m = (r — D = (r - 1) (e = 1) = 1218 
5.95; and the F-value for the one рег © gnificance with m = 3 
and n, = 52 is 4.18. 
We will reject o? = 0 if 


and then carry on with the fin 
and s?/s?,,, if o? = 0 is accepte 
may be found in the tables com; 

Let us illustrate this proce 


3 and n, = 
ent level of si 


2 
Si > 1.922 and s > 5.95; 
Sw H 
or if 
2 
S > 4.18. 


« 1.922 and 2 


eus 
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The constant multiplier 2 is arbitrary and a smaller value may be used 
where the experimenter is willing to sacrifice some power in the final F = 
for increased assurance against extreme disturbance in significance level. : 
simpler rule which, according to Paull, may be used when the degrees о 
freedom of both rows and columns are greater than 6 is to pool the interaction 
and within cells mean squares when their ratio is less than 2. This is approxi- 
mately equivalent to the above procedure, for large degrees of freedom, and 
does not necessitate reference to the F-table. 


III. Mixed Model 


Let us examine the case where the experimental data fit neither the 
assumptions of the linear hypothesis model nor those of the components of 
variance model exclusively, but fit the assumptions of a combination of the 
two. This is commonly called a mixed model. 


We assume for the mixed model that the effects of one of the factors 
(say the columns) have been obtained by the r 
representing that factor, while assuming that each element of the other 
factor (say the rows) has a constant effect which is typical for that element 
(i.e., no random sampling but rather fixed effects). 

We assume that each observation is composed as follows: 


andom selection of elements 


Хи = E+ k: + S; TT Qu. ; (27) 
again (і = 1, +++ р), (F = 1, + 
and Q,;, are derived in the 
above. The и; 


150, (k = 1, +++, т) and where £, S; , Tu? 


same way as in the components of variance model 
are constants such that 


Уш = 0. (28) 
i=l 
Bi Т, Qu, are normal, independent, random variables with the population 
values 
Mean Variance 
S; 0 в? 
T 0 с? (29) 
Qi; 0 с? 


Ав ш the case of the linear hypothesis model, we may wish to test the 
hypothesis of no row effects (m: = 0, fori = 1, --- , r); or, as in the case of 


the components of variance model, we may wish to test the hypothesis that 
the column effects are identical (02 = 


Ж 0). An example of this model was 
presented incidentally above as a variation of the linear hypothesis example- 
There are two schools of thought in the literature as to the proper error 
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term for testing the hypothesis of no random effect variation (c; = 0 in the 
case presented above) when c; ~ 0. On the one hand are those represented by 
Anderson and Baneroft (1, p. 340) who assume fixed interaction effects for all 
of the observations encompassed by a given random main effect. That is, they 
assume that the entire population of interaction effects for each of the random 
elements is included in the sample since the entire population of elements 
intersected by the random elements is so included. In the notation of this 
paper this assumption implies (in the case where the column effects are 
random and the row effects are fixed) 


УТ; = 0. 


Those represented by Mood (10, p. 348), on the other hand, assume that the 
interaction effects of the observations encompassed by both the random 
effects (columns above) and fixed effects (rows above) may be treated as 
random sampling variables exactly as in the case of the components of 
variance model. 

The model advocated by Mood leads to an expected value of the random 
(main) effects mean square which includes the term mo; , While the model 
advocated by Anderson and Bancroft leads to a random (main) effects mean 
square whose expected value does not include mo; . Thus, the expected value 
of the random column effects mean square above is o? + mro? + mo; under 
the Mood assumption, and e; + mro? under the Anderson and Bancroft 
assumption. As a consequence of this difference, the proper error term for 
testing the hypothesis of identical column elements (when c; = 0) is the 
within cells mean square in the case of the Anderson and Bancroft model 
and the interaction mean square in the case of the Mood model. 

If the position of Mood is accepted (and this implies that the interaction 
effect resulting from the coming together of a specific random element and a 
specific fixed element shows random error variation) the rule of procedure 
for this model for testing both the hypothesis of no row effects (u; — 0) and 
the hypothesis of no column effect variation (c; = 0) is identical to the rule 
for the components of variance model. The pooling procedure recommended 
by Paull (12) is applicable to the mixed model under Mood’s assumption 
when the main effects to be tested include the random variable. 

The rule of procedure if the position of Anderson and Bancroft is accepted 
(which is consistent with the general scheme presented early in this paper 
and usually more defensible) differs from that of the components of variance 
model in only one respect. The error term for testing the hypothesis of no 
column effect variation, when c? = 0 is rejected, is 52 instead of зї. The rule 
of procedure for testing the hypothesis of no row effect is identical to the 


components of variance model (except, of course, that an hypothesis of the 


sort и, = 0 is accepted or rejected in the case of the mixed model). 
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Where one wishes to test the fixed main effects with the Mood model к 
where one wishes to test either of the main effects with the wi anre a 
Bancroft model, no specific recommendations for a preliminary signi depu 
level can be made at this time. The most satisfactory pooling procedure 5 
terms of minimum disturbance or deviation in the significance level at whic : 
the experimenter thinks he is working and maximum power has, as yet, no 
been worked out under these conditions. All that has been said for the perpe 
models regarding (a) the motivations for using each of the каш, апа ( 
the dangers and necessary cautions in using the “always pool and some- 
times pool” procedures, applies equally to this model. It is as true for e 
model as it is for the other two that an investigator should not use either o 
these latter two procedures unless he has strong reason to believe that there 
are no interaction effects or interaction-effect differences, as the case may be. 
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A RATIONAL CURVE RELATING LENGTH OF REST PERIOD 
AND LENGTH OF SUBSEQUENT WORK PERIOD 
FOR AN ERGOGRAPHIC EXPERIMENT* 


LEDYARD В Tucker 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


А rational function is developed relating the length of a rest period and 
length of subsequent work period in an ergographic situation. Simple energistic 
postulates are used for a critical organ or neuromuscular structure whose 
failure to perform adequately results in a stoppage of the work period. Experi- 
mental results for two subjects using a finger ergograph indicate that the func- 
tion yields the general trend of the data but that there seem to be some 
systematic deviations of the data from the present rational function. One 
parameter determined from the data represents rate of recovery from moder- 
ed that this development will aid in studies of motor 


ate fatigue, It is ho in st 
"in ugue uda such athar variahlos ne age, motivation, and effücis 


"neHnm mm reato 
WEE vm 
rational development oecurred during а perusal 


The idea for the present : | 
of general literature on work decrement. A number of ау АДИ have used 
the ergograph in a variety of studies ranging from those concerned with per- 


sonality characteristics to those dealing with work in industry, While consid- 
erable progress has been made by physiologists on characteristies of active 
muscles and nerves, there seems to have been only moderate success in appli- 
cation of these physiological developments to the problems encountered by 
Psychologists in dealing with behavior of integrated, intact individuals. 
Indeed, there are a number of instances where psychologists claim that be- 
havior such as exhibited with the ergograph cannot be accoun ted for on purely 
physiological and energistic grounds. The difficulty may be in finding how the 
various physiological details can be incorporated into descriptions of behavior 
of the complete individual. A second possibility is that psychologists have not 
Considered sufficiently simple and limited behavioral situations to observe the 
Physiological and energistic determiners of behavior. In the present case a few 
Simple energy relations are postulated which only approximate the relations: 
that might be determined on physiological grounds. These simple relations, 

owever, permit development of a functional relation observable in the per- 
formance of an individual in а limited ergographic experiment. Psychologists 
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may find the present development of use in studying more complex situations. 

After an individual has performed a constant, repetitive, motor task to 
such an extent that he no longer can continue, a rest period will result in the 
individual’s being able to perform the task again for some work period before 
again being unable to continue. A graph relating length of rest period and 


Subject 1 
70 


Asyzptote 


Length of Work, W, in Seconds 


o 20 фо б б 100 120 0 160 180 200 220 2 


Length of Rest, В, 1n Seconds 


Subject 2 


Length of Work, W, in Seconds 


о 2 о б 80 100 120 що 160 180 20 гго 2o 
Length of Rest, R, in Seconds 


FIGURE 1 
Rest-Work Curves for Two Subjects 


length’ of subsequent work period will be of the form of those shown in Figure 1. 
A similar result was obtained empirically by Manzer (2). Short rest periods 
will be followed by short work periods, longer rest periods will be followed by 
longer work periods. As the rest periods are lengthened, the subsequent work 
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periods should also length 78 В 
шш length of unre а А нти Remb Ad | 
шт чощ а rational function, several assumptions are made con- 
кнн Pun pea within the organ or neuromuscular structure whose 
mim be ihe ion - equately is responsible for the work stoppage. The organ 
FE oni куе № = ing muscle, or it might be one of the nervous elements 
de. or excitation of the muscle. We will consider at this time only the 
failure of T. or structure whose failure to function adequately results in 
кы м individual to perform the task. It is assumed that fatigue of other 
=. EM structures will have little effect on length of the work period so long 
duci po do function. This is probably an oversimplification of the 
Eee - Interaction between organs or structures probably does occur such 
fos fies л 2 е results in greater expenditure of energy by others in order 
замма аа е the task. This interaction is being ignored in the 
eed an organ that is using energy at some constant rate during a 
My 5 period. The supply of energy immediately available to the organ to be 
ied ae the task is being depleted. If this energy is being replen- 
wil be a slower rate than that at which it is being used, the supply of energy 
Y e reduced. When the energy level falls to some critical point, the indi- 
ual will be unable to continue the task and the work period will end. 
bn Eun a rest period the energy supply of the organ will be replenished to 
Shere ent dependent on the length of the rest period and the rate at which 
bod £y is being made available to the organ by the rest of the individual’s 
fe y. (For the present development, the nature of the physiological mechan- 
ilio involved is not of immediate relevance.) At the end of such a rest period, 
for immediately available energy supply of the organ will again support per- 
mance of the task during a subsequent work period. 
Consider the following postulates and definitions. Let: 


E, = energy immediately available to organ at time ¢; (1) 
Em = energy immediately available to organ when it is in a completely rested 
state; (2) 
a = rate of expenditure of energy during work period (postulated to bea 
C(E, constant); (3) 
— Е.) = postulated rate at which body replaces energy to the organ; (4) 
W = length of work period; and o 


R = length of rest period. 
a limit on the type of situation to 
te. The task should not be one for 
rested and then slow down 
ry with the fatigue of the 
where there may be long 
inappropriate 


ti to be noted that postulate (3) forms 
че the present development is арргорпа 
th ch the individual may work faster when more 
+ ree he becomes fatigued, nor 3 task ` 
ini lvidual. The common type of ergographic series, e the 

itial strokes followed by short strokes as the individual tires, 18 


54 PSYCHOMETRIKA 


for the present development. In an ergograph situation the strokes should be 
of constant length and made at constant timing. Inability to make a stroke of 
standard length is to be interpreted as failure to perform the task. Thus, the 
individual is not driven to such fatigue that he cannot make a stroke of any 
length; he just cannot make one of standard length. Even in this case, this 
assumption of a constant rate of expenditure of energy is probably an approxi- 
mation. 

Postulate (4) involves the simple concept that energy replacement occurs 
at a rate proportional to the extent of deficiency below a maximum amount of 
energy available. This maximum amount of energy available is that which 
would be present in a completely rested organ. C is the constant of propor- 
tionality. (Em — Е,) is the extent of energy deficiency. This postulate is prob- 
ably a gross approximation to a true function which could be determined from 
physiological considerations, but it should be usable for cruder developments 
and for cases involving a limited task, such as the flexion of a finger. This 
postulate would probably be inappropriate for more extensive tasks involving 
a large proportion of the body. 


Energy Available, Ey 


Time t 


FIGURE 2 


Relation Between Energy Available and Time During a Series of 
Work and Rest Periods 


Consider the curve of energ i ANDES А 
y available versu 
t the energy level is B, . As an i 5 time in Figure 2. At time 


decreases along the curve until it 
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energy for the organ to continue its activity. The time interval for this initial 
work period is W, . 

A rest period of duration R, is now imposed and the energy level builds 
up to Е, . During the following work period the energy level reduces to E; , а 
critical level between continued and non-continued performance of the task. 
Since the task has not been altered, we might postulate that 


Е =F. (7) 


The duration of this work period is W, . 
Consider that a second rest period of duration E; is imposed, which is fol- 


lowed by a work period of duration W, . The terminal energy level E is again 
the critical level between continued and non-continued performance of the 
task, and, therefore, is equal to F, and E, . Let 

Ёз = В. (8) 


These two rest periods started with the identical energy levels E; and Е, as 
postulated in (7); thus, if the energy restoration conditions are identical as 
postulated in (4), the energy levels at the end of these rest periods should be 


identical, i.e., 

Е = Ey. А 
It might be expected, then, that the two subsequent work periods would be 
identical also, i.e., 

Wa = И. A 


This logic would lead to an expectation that a long sequence of rest-work 
periods with equal rest periods would have equal work periods. Yochelson (3) 


ieat in wor i ring definite 
as reported data indicating such constancy 1n w ork periods following e 
eue = ork periods. Data gathered in the 


rest periods in long sequences of rest and w | 
€xperimental try-out of this development also tended to support this con- 
t spring tension with a fixed excur- 


tention. A finger ergograph working agains І à 
Sion to a block was used. The rate of contractions was set at one contraction 


Per second. Preliminary trials revealed considerable initial practice effect, 
Session to session, for a subject. During practice sessions some long ree 
of rest and work periods with equal rest periods were tried out. T ol =~ 
results for the sixth practice session for one subject using a series of 60-secon 


E i i hs of the work periods, in seconds, were 54, 
ЕЕ 30, 34, 28, 28, 31, 39, 28, 28. The first 


36, 30 28, 28, 28, 33, 31, 
one of M ane Ps ЖЫ series should not be counted. It corresponds ths 
Initia] work period Wao before any of the fixed rest periods and т ^ n 
Pected to be long. The remaining work periods seem to vary within т a 
Constant band with no apparent progressive decrement. Presumably ч s wi 

hold only for a finite time and the experiment should not involve excessive 


Sessions, 
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During the rest period R, the rate of change of energy with time can be 
obtained from postulate (4). Only energy replacement is considered to be 
active during the rest period. 


= = Cf, — E). (11) 
Integration yields 


eH. = E, = gE (12) 


where fis a constant of integration. When the terminal times ¢, and t; and the 
corresponding energy levels Е, and E, are substituted into (12), one obtains 


Е. -Е enen 
т 


En — B, T oen ae 
= есенен (14) 
= еб), (15) 
It is to be noted that the length of the rest period is 
R=b= h; (16) 


since t and і are the end and beginning times. The subscript to R is being 
dropped for convenience. Then (15) can be written 


Е, — Е 
EEUU an 
or, solving for E, , 
Е, = En — (E, — E,)e~°*, (18) 


Consider the subsequent work period. Energy is being used at a constant 


rate as per postulate (3) as well as being replenished as per postulate (4). Thus, 
dE, | 
di = ca CE, — E). (19) 
Integration yields 
E,—E,— Ls EMT 
t C LS А (20) 


where g is а constant of integration. 


Substituti imiting ti 
energy levels E; and E, and writin шоп of limiting times i, and f, and 


5 а ratio yields 
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1 
Ey — Е. — @° g^ 09 


= (21) 
1 (-Cta*2) 
E. — Ва 6 
=, gem (22) 
= e. (23) 
. Where 
W2t-h. (24) 
Substituting E, for E; as per (7) and solving for E, yields 
1 Emo ) cw К 
E, = E, — La - (в. E, cue: (25) 


In relating the work period and rest period, the two expressions for Е. in 
(18) and (25) are equated to yield 


1 
Е. — (En — Eve = En — ба = (s. m ay". (26) 


Subtracting Е, from both sides of the equation, 


E,— E, —(E, — E)e ^" 
1 1 ew 
-5.- E- ba- (En -& - фа}; . (27) 


Ог, 
1 cw 
&.- ва - 6%) = (E. -& - 6 ate"), © 
Z.E) (—6*5 = (1 — e). (29) 
(s. — Е, – С а 
Define 
(E, — E) : (30) 
ee 
(s. = E, О 
Then 
Ba —-675 = 0 - 6"). 6D 
Or, solving for e°”, 
Son (82) 


oo" =1=В+ Ве 
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Thus c^" is linearly related to e ^" with a slope of В and intercept of 1 — В. 
It is interesting to note that when the numerator and denominator of the 
right side of (30) are multiplied by C, 
C(E,, ыы E) 


BS TX m + CE. Ej 4 5 (33) 


Thus, from (11) and (19) 


ss (for the rest period) 


T dE, 
dt 


Another point of interest is that the relation given in (31) or (32) does not 
involve directly the amount of energy expended. Only measures of duration of 
rest and work periods need be determined. It is not necessary to observe the 
energy expended as is frequently attempted in ergographie experiments by 
computing the work performed by the muscle. (In case the muscle is not the 
critical organ responsible for the work stoppage, the work performed by the 
muscle would not be equal to the energy expenditure to be considered in case 
it were necessary to determine the constant a.) This fortunate feature is due to 
the restriction to a situation for which there is a constant rate of energy ex- 


4 (34) 
(for the work period) 


tractions was set at one per second in all three cases. On each contraction, the 
limit block was to be touched. Failure to make a complete stroke ended each 
work period. In each experiment one subject was used for a number of sessions. 
Each Session was composed of a “warm-up” period involving three work 
periods separated by 60-second rest periods. The first experimental rest period 


Instead of having a sequence of equal rest periods and thus determining one 
n, each of the selected rest periods 
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preliminary subject were analyzed and the curve of (32) fits these data about 
as well as it does the data for the two subjects reported here. 
Р Mean lengths of work periods following ће chosen rest periods are given 
in Table 1. 
TABLE 1 
Experizental Results 


(А11 tines are in seconds.) 


Subject 1 Subject 2 

Length of Length of length of ^ length of 

Rest. Tork» Rest Worke+ 

5 $.8 $ 10.0 - 

10 9.9 10 1.7 

20 2.9 20 21.5 

lo 23.5 Lo 31.8 

60 27.8 во 39.7 

90 37.5 80 5.2 

120 1.5 100 51.0 

160 8.1 120 5,0 

200 50.0 180 67.0 

20 52.9 20 77.8 


ким 
ж Mean length of work periods over 8 sessions. 


^x Mean length of work periods over 6 sessions, 


The values of B and C for each subject were determined graphically. 
In cases where more precise determinations of these constants are desired, 
Some more precise statistical method of curve fitting might be used. In the 
Present case we were interested in obtaining only the proper order of magni- 
tudes of B and C and felt that there was an advantage in the graphical method 


FIGURE 3 


Rectification of Rest-Work Curve for Subject 1 
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in surveying the properties of the function and the data. A series of trial values 
of C were assumed. For each value of C, values of e^" and e” °* were obtained. 

Figure 3 shows graphs for subject 1 for three values of С. Each point is 
determined by one rest period and the subsequent work period. From (32) it 
is expected that the points between e^" and e °* would be linearly related 
for the proper value of С. Analysis of (32) also indicates that this line should 
pass through the point (1, 1). All three lines drawn in Figure 3 pass through 
this point. It is to be noted in Figure 3 that а low value of C yields a negative 
curvature and a high value of C yields a positive curvature. A C of .010 seemed 
to yield the best approximation to a straight line. A best-fitting line was 
drawn by eye with a slope of —.76, thus determining B. The line drawn on 


Figure 1 for subject 1 is the line for (32) with the values of B and C determined 
above. 


2.2 


1.0 
0.0 0.2 о. 


FIGURE 4 
Rectification of Rest-Work Curve for Subject 2 


Figure 4 shows graphs of e°” and е7 for subject 2 and three trial values 
for С. While deviations from a straight line do not seem extreme, it is of inter- 
est that the points cannot be brought into more or less ТАЙ оп? fluctuations 
around a straight line by any choice of C. There seems to be a systematic 
wave about the straight line in each graph. This type of curve may result from 
the inadequacies of our approximations. A slight suggestion of this effect may 
be detected also for subject 1 in Figure 3. These results seem to be related to 
the results reported by Féré (1) and by Manzer (2), where performances after 
moderate rest periods were superior to performances after longer rest periods. 
While the consistency and seriousness of this lack of fit of the present function 
is a matter for further study, it is the author's judgment from the present data 
that (32) yields the general Sweep of the observations. Some different set 0 
postulates might yield a better fit to the data, or the systematic deviations 


may be considered as perturbations to be accounted for by further complexities 
in the mathematical model. 
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Returning to the fitting of (32) to the results for subject 2, values of .008 
for C and —.95 for B seemed to give the best fit to the data. The corresponding 
curve is drawn in Figure 1. 

An asymptote is indicated for each curve in Figure 1. This asymptote 
may be determined by setting В equal to infinity in (32.) Then € °* is zero 
and е" equals (1 — B). In the experiment, the first work period exceeded 
this asymptote by some 10 to 20 per cent. This is a second indication of an 
inadequacy of our formulation which might be corrected by a more complex 
Set of postulates. Another possibility is to interpret the present function to 
apply to the body state following the warm-up period in the experimental 
Sessions, 

. Future work with the rational rest-work function can take any of several 
lines aside from development of a more adequate (and probably more complex) 
function. Individual differences in values of C and B for a fixed experiment 
might be correlated with other variables such as age. Effects of such conditions 
as ventilation, use of drugs, motivation, and response modality on C and B 
could be investigated. The experiment could be expanded to include systematic 
variation in load on the ergograph and timing of flexions, thus investigating 
other characteristics of the present function when a (rate of energy expendi- 
ture) and E, (critical energy level) are varied. It would be hoped that use of 
the rational function for the rest-work curve would help in obtaining greater 
Precision in results for these various types of investigations. 
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A MEASURE OF INTERRELATIONSHIP FOR OVERLAPPING 
GROUPS* 


BEN J. WINER 


PURDUE UNIVERSITY 


„ А coefficient of interrelationship between overlapping groups that 
avoids indeterminacies inherént in the construction of fourfold tables for 
such purposes and, at the same time, is relatively insensitive to the absolute 
magnitude of marginal totals of fourfold tables, is developed. Under assump- 
tions that are consistent with the objectives of organizational analysis, this 


coefficient is shown to be e uivalent to a product-moment correlation со- 
his coefficient are pointed out. 


efficient. The advantages and limitations of t! [ 
À numerical example giving computational procedures is presented. 


of individuals is made up of k overlapping 
tion is placed upon the number of groups 
ems arise in analyzing the structure 


Suppose an organization G 
groups g, , ga , «++ , фк. No restric 
to which an individual may belong. Probl 
9f organizations in which the equivalent of a correlation coefficient is needed. 
For example, one may seek means for simplifying the group structure of a 
Complex organization. If one could construct the equivalent of an inter- 

trix might suggest 


Correlation matrix, the factorial structure of this ma 
vithin the organizationin such a way 


Means for restructuring the groups 
as to preserve many of the optimal conditions that may be present in the 
More complex structure and, at the same time, suggest ways of reducing the 
number of groups necessary to accomplish the same general mission. _ 

As a starting point for this analysis, suppose one had the matrix of 


Observations X, whose elements ni; represent the number of individuals in 
, 


G who belong to both g: and 9; , ie. 


туу M2 Mir 
Na s 777 Nak 1 
ya . Q) 
ve 
Ne "2 Nek 


(Joint; occurrence matrices of higher order will not be considered in the 
ave as elements Mijm › 


Present development. Third-order ma 
nnection with a. study made by Dorothy C. Adkins 


felt in many ways. "The article was prepared while 
ty of North Carolina. 


63 


trices would h 


(1), nahis measure was developed in co 
the ле" Influence in the development was. 
author was employed at The Universi 


64 PSYCHOMETRIKA 


i.e., the number of individuals who are members of 9: , 9; , and д, simul- 
taneously.) 

The number of individuals in each of the groups, n;; , may show con- 
siderable variation. Up to a certain point, for purposes of analyzing organi- 
zational structure, the relative magnitudes of the n;; are not particularly 
important. The coefficient of correlation sought, therefore, is one that is 
relatively insensitive to the magnitudes of the n,; . 


One of the simplest approaches to the problem would be to define the 
correlation between 9; and g; by 


Nii 


Nas ae a 


т; = 


efficient is equivalent to a product- 
her general conditions, it has the 
the relative magnitudes of n;; and 
doption. 


; іп a sense, the minus-minus cell of 
ne possible fourfold table might be 


д › also sensitive to small change? 
8 quickly led to the conclusion that a fourfo! 
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table does not provide a satisfactory starting point for an index of over- 
lapping group structure. [For a more detailed discussion of problems en- 
countered in using fourfold tables in related areas see (2, 3, 4).] 

An index which, in a real sense, is equivalent to a product-moment 
correlation coefficient can be derived from the observation matrix X by a 
series of assumptions that are consistent with the objectives of the subsequent 
organizational analysis. As a first step in this derivation, one sets up a matrix 
P having as elements 

pi = тат: 
If D, is a diagonal matrix having as diagonal elements n;; , the matrix P is 
given by 
P = DX. (3) 
In general, P will not be a symmetric matrix. The elements of P represent 


the proportion of the members of g; that also belong to gi - f x 
As a second step in the derivation, let D, be a diagonal matrix having 


as diagonal elements the lengths of the row vectors of P, 1.е., an element 
of D, is given by V У), ру; . In order to normalize the rows of P, premultiply 


P by D;' to obtain 
а, 


аз 


Е = DUP = Dy Dx = |с | (4) 


ак 


where а; is a normalized row vector. The rows of F can be thought of as unit 
vectors in a k-dimensional vector space. Assuming that the basis vectors 
for this space are unit orthogonal vectors (justification of this assumption 


will be given presently), the elements of the matrix 

В = FF’ (5) 
represent the cosines of the angles between pairs of vectors. The correlation 
between 9: and 9; can be defined by 


Ti; = cos (a; , а) = ага;, (6) 
ie., the scalar product of two row vectors having unit length. 
In the language of factor analysis 
а: Gig coo 0 
аз Goo cot Ox (7) 


F = 
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can be inter ix i 
Sk атн | $ a matrix of factor loadings. The rows of this matrix 
кб ы ЕНЕ the projections of the groups оп a set of 
(Жу sa iie ge m 9: had no overlap with any of the other 
es eel ii по would have unity in the ИВ row and zeroes 
оса eee a reference vectors, or factors, represent the 
d Yo toc › lapping groups in the k-space. This interpretation 
ors can be considered a possible justification for thi i 
CV olas gil. ace i p j on for the choice of an 
ег eference vectors rather than an oblique set. The effective 
e of tj e common factor space may be less than k; indeed, it is the 
J е ive of studies in this area to find this reduced dimension. 
Е... another approach to the derivation of this proposed index of over- 
ros s te чыш suggests possible limitations implicit in it. Consider 
picts d = Е as representing the profile of the joining behavior of the 
шл ividual in g; . Then the index of relationship derived here can be 
(The devi as a measure of profile similarity between two average individuals. 
өнү viations of each of the profiles are essentially measured from a common 
which ыы in terms of a metric relatively insensitive to n;; .) For those groups 
dien ar » homogeneous with respect to belonging behavior (i.e., high intra- 
8 Rtn), this average individual closely approximates all individuals 
нА group. Where the groups аге: not particularly homogeneous with 
shi ct to belonging behavior, this average profile (and the index of relation- 
р based upon it) has somewhat limited value. 
"A Asa purely descriptive coefficient, the question of sampling distribution 
Pe Tor arise. The exact sampling distribution of the coefficient proposed 
а за difficult problem in multivariate analysis. If the number of groups is 
a a and the number of individuals within each group is also large, in spite 
s е fact that the coefficient can assume only positive values, it appears 
uc onable to assume that the sampling distribution of the multiple correla- 
n coefficient provides an approximate sampling distribution for the index 


Proposed here. 


Numerical Example 

proposed index is given in (1). A 
ted here. Suppose an organization G 
each member of G may belong 
. , 6). Let the number of indi- 
lement in the ?th row 
Jements in the main 
duals in each of the 


Es interesting application of the 
consi " numerical application is presen 
© 2 5 of 5000 members. Further, suppose 

vidus all, or none of six groups 4; (ё = 1, °° 
and 8 belonging to both g; and g; be given by the e 
А jth column of the matrix shown in Table 1. The e 

Бопа] of this matrix represent the number of indivi 


Broups, 
row The matrix P (Table 2) is obtained from the matrix X by dividing each 
а of X by the corresponding entry in the main diagonal of X. The rows of 
ап be regarded as vectors representing g; ; the squares of the lengths of 
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these vectors, dî; , are obtained by squaring and summing the entries in each 
^ oeei (unit) row vectors (Table 3) are obtained from P by amip 
ing the rows of P by 1/ уа, . The sums of the squares of the entries in eac | 
row of F should total unity (within rounding error). The matrix F can be a 
sidered as a matrix of factor loadings. The entries in row 7 of F — 
the cosines of the angles between 0; and a set of hypothetically independen 

groups represented by an orthogonal set of basis vectors. From F one generates 
a correlation matrix in the same manner in which a correlation matrix is 


obtained from a set of orthogonal factor loadings, i.e., by post-multiplying 
F by its transpose. : 
A short-cut procedure is to compute the matrix ХХ’. If е 


г; is the typical 
element in that matrix, then an element of the matrix В 


is given by 


e; 
Ty = M 


NET EUR 


ii 


In Table 5 the matrix XX’ із computed; in Table 6 the typical element is 


е: Ме, . The matrix В is obtained by dividing each element of Table 5 
by the corresponding element in Table 6. 


REFERENCES 


- Adkins, D. C. The simple structure of the American Psychological Association. Amer- 
Psychologist, 1954, 9, 175-180. 


Carroll, J. B. The effect of difficulty and chance Success upon correlations between items 
or between tests, Psychometrika, 1945, 10, 1-19. 


3. Wherry, В. J. and Gaylord, В. Н. Factor 
of the correlation coefficient: content, difficu| 
1944, 9, 237-244, 


4. Wherry, R. J. and Winer, B. J. A method for factoring large numbers of test items. 
Psychometrika, 1953, 18, 161-179. 


m 


№ 


Pattern of test items and tests as а ыы d 
lty, and constant error factors. Psychometrika, 


M anuscript received б /96/54 


Revised manuscript received 7/5 /54 


PSYCHOMETRIKA—VOL. 20, No. 1 
MARCH, 1955 


AN EXTENSION OF ANDERSON’S SOLUTION 
FOR THE LATENT STRUCTURE EQUATIONS 


W. А. GIBSON* 


CENTER FOR ADVANCED STUDY IN THE BEHAVIORAL SCIENCES 


Anderson’s solution for the latent structure equations is summarized 
and then extended in two ways so as to involve all items simultaneously. 


Some time ago Lazarsfeld and Dudman (4) achieved a solution, by means 
of determinantal equations, for Lazarsfeld’s latent structure equations. 
Recently their solution was extended by Anderson (1) in such a way as to 
involve matrix manipulations only. Both of these solutions have the ad- 
vantage over that of Green (2) of avoiding the need for estimating unknown 
elements in the manifest matrices—the elements with recurring subscripts. 
They have the disadvantage, however, of using much less of the empirical 
data than does Green’s solution. The purpose of this note is to indicate two 
Ways in which Anderson's solution can be extended so as to involve more of 
the empirical data and thus compare more favorably with Green's solution 


in that regard. 
The latent structure equations haye been developed elsewhere (3) and 


Will merely be restated here in matrix form: 


R = L'VL, (1) 


and 
В, = ГУБЫ» (2) 
where R is the sample joint proportions matrix bordered by T Wap M 
marginals, R, is the sample triple proportions matrix for item 1 or oe 
by the joint proportions involving item k, L contains the laten pe s 
for all items and has its top row filled with I's, V is diagonal and contains ү 
relative class sizes in its diagonal cells, and D, is diagonal and np ; 
entries from row Ё of L’ in its diagonal cells. All diagonal cells but ii | к! 
in R and R, and all cells in row and column k of M „ are p og 
as to be estimated if those ва ves oh a hake Aw s 
he or is n + 1, n being the num ] 
the Sn E in (1) and (2) is m, the number of latent classes 
needed to account for the manifest data. 
C *This article was written while the author was employed at 
&rolina, 


The University of North 
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Lazarsfeld (3, р. 389) has defined a Базе determinant of R as a deter- 
minant formed from the rows and columns of R in such a way as to include 
the first diagonal element in R but no other diagonal element. Thus no Базе 
determinant in R contains unknown elements. A basic determinant of R, 
would be analogously defined and would contain no unknown elements pro- 
vided row and column k of В, were not involved. It is here convenient to 
speak of the basic sub-matrices of R and R, as being the matrices of the basic 
determinants. For the present purpose the basic sub-matrices will always be 
dealt with in pairs—one from R and the corresponding one from R, . Con- 
sequently the further restriction will be imposed that neither row nor column 
k of either R or R, may be involved in a pair of basic sub-matrices. Finally, 
we shall be concerned only with basic sub-matrices of order and rank m. 

Let P and P, represent such a pair of basic sub-matrices. Then, by virtue 
of (1) and (2), 


P = LVL, (3) 
and 


Р, = ТАУ, Г, , (4) 


stated, it follows that no item is represented both in L{ and L, ‚апа that item k 
is represented neither in L/ nor in L, . Item k is, however, represented in D, . 
Because of its role in the formation of R, ‚ Lazarsfeld has referred to item k 
as the stratifier (cf. 3, pp. 391-392). 

Anderson’s solution is simply to form the matrix, 


А = Рр, = Lr VLP UY DL, = Ly D. (5) 


and then obtain the characteristic roots and the right-sided characteristic 
vectors of A to get D, and Ly 'K, where K is an arbitrary diagonal matrix 
and remains, for the moment, unknown. Post-multiplying (5) through by 


zi = : Е м 
Lz'K shows that Lz'K gives the right-sided characteristic vectors of A and 
that D, contains the latent roots of А. Thus, 


АШК = L'DLLSK = ps pk. L?KD,. (6) 
Post-multiplying (3) by L;'K gives 
PLK = VLLK = МУК. (7) 


2 multipliers оп its columns. These 
multipliers turn out to be simply the entries in the first row of МУК, since 
the first row of L{ must contain only 1°. L{ is thus obtained from the relation- 
ship, 
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Li = (МУК(УК)". (8) 

Given L{ , the matrix product VL» is obtained from (3) as follows: 
ҮІ. = U P: (9) 


Both У and L, can now be obtained because the first column in VL; 
contains the diagonal elements of V. : 
е.) 2 form the solution by Anderson involves only 2m — 1 of the 
warn е m — 1 items represented in L, , the m — 1 items in L, , and the 
ы ег К. There are two ways їп which Anderson’s solution can be extended 
0 Involve all of the items. No unknown elements will be introduced into the 
manifest matrices that are used. One way is to use a composite stratifier 
м of some combination of any of the items that are not represented 
L 1 ог Ly : The other way is to augment the basic sub-matrices (hence also 
{) by additional rows representing all items not involved in La or in the 
stratifier. 
The composite stratifier will be considered first. Let the subscript kL- 
stand for a combination of any of the items that are not involved in L, or Ls . 
Then the sum of the corresponding triple proportions matrices is given by 


Re = Mtis L'VD,L + Үр +- 


= РУ, + Di + ЭГ = L'VDa..L. 
or a basic sub-matrix 


(10) 


By analogy with (10) the latent structure equation fi 
in Rrr- is 
Р... = МУВы--[» < (11) 
Prı.. can be used in Anderson's solution in exactly the same way as is P,. 
In the special case where the subscript kl— refers to all of the items not 
involved in L/ and L; , it turns out that there is only one possible Р... 
A pre-publication reviewer has suggested that any weighted sum, and 
not just the simple sum, of Rx and D, matrices could represent a composite 


Stratifier, 

Now consider the second way in 1 
extended. Let P and Р... be augmented by additional 0 
of the items that are not represented in L, or in the (single or composite) 


Stratifier, Thus P, Pu-- ; and L{ cease to be square, but (3) and (11) still 
hold, and all other matrices in those equations remain square. 
Now form the matrix 
- (VIAL LA) IAVIAIAV Da...» 


В = (PP) "P'Pa-. 
LV LL) VOLU VL,L{V Du--La 
(12) 


= LgDa.la- 


Anderson’s solution can be 


which 
rows representing all 
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The last step of (12) is identical with that of (5), except that the stratifier 
may here be composite. Thus the solution is the same as for Anderson from 
this point on, except that (9) is replaced by 


VL, = (AL)? LP аз) 


because Lj is no longer square. 

It is perhaps worth mentioning that this extension of Anderson’s solution 
can be shown to have two least-squares properties. The first is that the 
matrix B in (12) is such as to minimize the sum of squared discrepancies 
between the matrices Р... and PB. The second is that the matrix VL, in 
(13) is such as to minimize the sum of squared discrepancies between the 
matrices P and LiVL, . 

A few remarks may be in order as to which items should be involved in 


aoe i ution by Green has this same problem (2, p. 158). 
inally, all items not involved in the chosen basic sub-matrix of В nor in the 
own into the extra rows of P and Р... 


After an appropriate P and Ры-. have been formed according to the 
requirements mentioned in the previous 


> matrix (ТАТА)! I P and divide each of 
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MENTAL ABILITIES AND PERSONALITY TRAITS* 
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Carvin W. TAYLORÎ 
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. The relationship between measures of verbal fluency and certain person- 
ality traits is examine by factor techniques. From a matrix of eight factor 
scores derived from mental tests plus five personality scores, six factors were 
obtained. An oblique solution lends limited support to the hypothesized 
relationship between the two domains. 


In factorial studies of abilities, it has become general practice to include 
two or three “anchor” tests to measure each of the primary mental abilities 
that might be related to the experimental variables. In this way, one or 
more new factors may be isolated and interpreted with each successive 
well-planned study. The “anchor” tests which will probably best measure 
each of the several “established” factors can be identified fairly well. 

This is not yet the case, however, for the area of temperament and 
personality, where there is much less agreement upon “anchor” variables. 
Using several different approaches, Cattell (1) has rather consistently found 
ten to twelve factors. Guilford has developed three inventories, STDCR, 
GAMIN, and I, which represent end products of his efforts to measure 
temperament factors. Although no serious effort has been made to compare 
the works of these authors, it seems that some of their factors may be quite 


similar while others apparently do not appear in both sets. - | 
Few studies have straddled mental abilities and personality traits, even 


though the nature of any relationships found would be of considerable 
theoretical and practical importance. Thornton (9) found practically no 
overlap between tests d four questionnaire-type variables 


of mental abilities an 
which measured a single factor called “Feeling of Adequacy.” Other studies 


РИ *This study was supported by а grant from the Research Foundation of the Uni- 
sity of Utah. Research Council. 
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likewise find little relationship (2). There is supporting evidence, чыз, 
for the hypothesis that fluent persons tend to be independent, extraverted, 
a x piii study is an effort to help define the relationships between 
mental abilities and personality traits, the latter being measured by a question- 
naire. The major hypothesis was that there would be some relationship 
between measures of verbal fluency and extraversion or rhathymia. Studies 
by Cattell and other British investigators (6) would tend to support E 
hypothesis. The relationship among the mental ability scores was also 0) 
interest since this involved, іп a sense, a second-order factor study of eight 
cognitive factors. The personality score intercorrelations, which are ad- 
mittedly distorted by experimental-dependence conditions in scoring (3), 
were of minor interest. 


The Variables 


Data on twenty-eight mental ability tests and on a personality inventory 
were collected by Taylor, the mental ability tests furnishing the basis for 
his study of fluency (8). For the present study, the fifteen tests were selected 
which best measured Taylor’s eight primary abilities. Scores on two tests 
measuring the same factor were combined with equal weights to obtain a 
single index. (In the case of the Perceptual Speed factor, only one test was 


used.) These eight factor indices were included with five scores from a 
personality inventory for this study. 


The eight factor indices, for which the tests are described by Taylor, 
were as follows: 


- Memory (First Names, Word-Number) 

- Perceptual Speed (Identical Numbers) 

. Reasoning (Letter Series, Letter Grouping) 

. Number (Addition, Multiplication) 

. Verbal Comprehension (Same or Opposite, Completion) 
. Word Fluency (First and Last Letters, Suffixes) 

- Verbal Versatility (Similes, Letter Star) 

. Ideational Fluency (Topics, Theme) 


00 Чо ль о м н 


The remaining five Personality variables from Guilford’s “Inventory of 
the Factors STDCR” were: 


9. Social Introversion 
10. Thinking Introversion 
11. Depression 


12. Cycloid Tendency 
13. Rhathymia 


The data were obtained on 170 hig 
The score distributions on the eight fac 
The matrix of correlation coefficients for the 1 
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Procedures and Results 


Thurstone’s group centroid method. Six factors we 
oblique rotational solution obtained. 


tercorrelations (above 


5 
ч 
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е س‎ 


2 3 4 5 67 8 9 10 2 32 13 
оу 35 21 22 19 № 17 -07 10 -10 -08 -0h 
34 3% 24 23 26 27 -20 07 01 05 22 
об зз 51 36 № 33 -12 -12 -19 -10 14 
o2 -03 18 30 21 22 -16 +10 -11 -07 22 
оз 05 -03 36 зо 38 OF 0 -18 -18 -05 
-04 -02 03 02 32 32 -07 05 -0l от 07 
-02 01 03 -01 -02 50 16 31 OT 16 27 
-o1 -01 03 -0% -01 -03 -17 16 -05 -0 10 
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бе 01 «02. 05: <01 -<02 00 -01 53 48 -12 
-01 00 01 -03 -02 02 01 00 03 90 -08 
-02 -01 00 00 93 -о2 01 03 04 Ol 22 
13 оо -01 -01 02 02 20% 02 -01 -08 -0% =02 -03 
TABLE 2 
The Unrotated Factor Matrix 
p ПШ лу у Yr n? 
1 38 -02 -10 03 38 -22 35 
a 46 OF 35 *0 -26 -15 3% 
з 70 -14 -01 -20 20 - 59 
» 18 -09 11 -18 -13 -33 № 
5 63 -12 -27 oo 01 22 53 
6 55 97 -0 -04 -02 -01 31 
т 6 16 15 09 -02 33 56 
в 62 06 -01 28 -13 20 52 
-9 20 -29 66 34 02 „36 10 
10 06 63 -o8 39 04 -10 5T 
11 -13 95 зї -21 -07 93 96 
дә -o& 92 М -26 05 12 100 
02 16 -11 -07 95 65 


TABLE 1 


diagonal) and Residuals (below diagonal) 


veroorrel con де ЕН چ ڪڪ‎ 
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h-school seniors in Washington, D.C. 
tor scores were normalized. 
3 variables was analyzed by 
re extracted and an 
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The intercorrelations are presented in Table 1 along with the sixth- 
factor residuals (below the principal diagonal). The correlations among the 
personality scores were highest, as might be expected. In general, those among 
the mental abilities were next in size and those between mental abilities and 
personality traits were the lowest. Е 

Table 2 presents the centroid matrix and Table 3 the factor matrix 


TABLE 3 
Final Rotated Matrix 


A B с D E Р Variable 
1 -02 47 оо 03 -06 16 Memory 
2 #0 -10 16 06 06 06 Perceptual Speed 
3 11 236 33 -06 07 -10 Reasoning 
4 46 10 00 -03 -01 -03 Number 
5 04 08 57 -12 -14 оң Verbal Comprehension 
6 18 11 31 07 0? 05 Word Fluency 
7 -05 00 61 09 29 12 Verbal Versatility 
8 06 -10 57 -05 06 28 = Ideational Fluency 
-9 00 02 -08 -39 51 39 Social Extraversion 
10 -01 02 оз 50 -03 54 Thinking Introversion 


11 09 -06 -02 9} o1 00 Depression 
12 -05 05 01 93 42 -01  Cycioid Tendency 
13 O^ -03 0 оо 71 01 Rhathymia 


M 2‏ ص 


TABLE 4 TABLE 5 


Final Transformation Matrix Reference Vector Cosines 
———— —————— 
А в с D E Р А B с р Е 
І 27 20 57 -01 06 13 


В -35 
II -2% -18 14 -29 -05 оо С -2% -27 
III -68 92 -17 10 23 n р 09 17 -10 
ТУ -64 -28 78 -08 3% -27 Е -ho 12 12 08 
У OF 01 02 95 12 25 Р -08 04 -04 оо о! 
ҮТ -07 -03 -15 -o0 90 13 Gic = 
A аз 


after rotation to simple structure. In these two tables and in all discussions 
hereafter, variable 9, Social Introversion, is treated as a reflected variable 
and is labeled — 9 and called Social Extraversion, for convenience. Table 4 
presents the final transformation matrix and Table 5 the intercorrelations 
between the reference vectors. 

The six factors in the rotated solution include three factors involving 
mental abilities and three mainly concerned with personality traits, None of 
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= аа had loadings оп the mental ability factors, b 

са ез es did have loadings of almost .30 on each of two ch. 
Pf gt wierd su loadings greater than .25 are shown below for each 
used are not like Sene наа л nite: m seg comin е 
either primary or second-order аы ЕВИНИ е 


Factor А 


4. Number 46 
2. Perceptual Speed 40 


of м = a a number, or perhaps a speed, factor. The appearance 
бта] fuer peed (Identical Numbers test) is not surprising. In Taylor’s 
a елау study the same variable had a loading of .24 on the number 
Careful oF esigned to measure such abilities as Perceptual Speed or 

fulness which involve the manipulation of numbers frequently have 


signi кн 
nificant projections on a number factor. 


Factor B 
AT 


1. Memory 
86 


2. Reasoning 


gnated as a memory factor. Some of the 


Factor B is tentatively desi 
hown that tests of reasoning ability are 


studies in the field have s 
ated to memory (2, p. 148). 


Factor C 
7. Verbal Versatility 61 
8. Ideational Fluency 57 
5. Verbal Comprehension 57 
3. Reasoning 33 
6. Word Fluency ol 


y This factor approaches а general factor of mental ability best represented 
Verbal tests, particularly by measures of fluency which involve the meaning 


9f words, 
Factor D 
11. Depression .94 
12. Cycloid Tendency .93 
10. Thinking Introversion .50 
—.39 


—9. Social Extraversion 
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This factor approaches a general (to this battery) measure of personality, 
each of the variables except Rhathymia having projections on it. It may bs 
tentatively interpreted as Depression. Items dealing with “moodiness, ; 
“feelings easily hurt," “lost in thought," and “self-conscious” are typical 0 
the depressive-type item contained in common in the S, Т, ©, апа D scoring 
keys. Items such as these can account for much of the variance in this factor. 
The negative loading of variable 9, which was reflected prior to factoring, 
means that the unreflected variable, Social Introversion, is positively related 
to this factor. Factor D corresponds closely to Lovell’s (4) factor which was 


called “Emotionality,” or the opposite pole of Thurstone’s (10) “Emotional 
Stability” factor. 


Factor Е 
13. Rhathymia Al 
—9. Social Extraversion 51 
12. Cycloid Tendency 42 
7. Verbal Versatility 29 


Surgency is probably the best interpretation that can be given to this 
factor, in spite of the leading variable. Cattell has pointed out the similarity 
between Surgency and Rhathymia. This interpretation is supported by the 
positive loading of Social Extraversion and by the fact that it fits Studman 8 
definition of a fluent person. In many ways it corresponds to the "Drive 
factor found by Lovell. The loading of Verbal Versatility lends limited support 


to the original hypothesis. The correlation of .27 between Rhathymia av! 


Verbal Versatility was larger than any other correlation cutting across the 
cognitive and personality 


domains. The other two types of fluency, Ideation®, 
Fluency and Word Fluency, showed no relationship with this “SurgencY 


factor. 
Factor F 
10. Thinking Introversion .54 
—9. Social Extraversion .39 
8. Ideational Fluency .28 


This rather ambiguous factor is 
to interpret. Abstracting from Guilfo 
represent “meditative thinki 
in addition to “entering into 
of ideas." Such a trait confi 


+. difficult 
not strongly determined and is diffi“ 
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Discussion 


NÉ selene end SS 
analysis reported by Rimoldi (5). Relati ips i ate ee 
ры Е y imoldi (5). Relationships in such studies generally 
de gnified. This provides another reason for considering the interpre- 
ions of the factors as tentative. 

xu hypothesis that there is a relationship between fluency scores and 
йк personality characteristics is supported to a limited degree. The 
Verbal 9 relevant to this hypothesis is as follows: the fluency measure, 
mee : ersatility, had a projection of .29 on the Surgency factor (E) and 
е ated 27 with Rhathymia. Ideational Fluency had а projection of .28 
S: а с personality factor (F); Word Fluency had zero loadings 
= m ше personality factors, and all personality scores had zero loadings 
ability ree mental ability factors. The results for the remaining mental 

y variables are consistent with those of other investigations, in which 
many zero and a few low relationships between the mental ability and 
Personality areas are reported. Improvement in test construction in both 
domains and further analyses may lead to higher correlations and also to 


greater insight into the bases of any relationships that appear. 
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A TABULAR METHOD OF OBTAINING TETRACHORIC r WITH 
MEDIAN-CUT VARIABLES. 


GEORGE SCHLAGER WELSH 


THE UNIVERSITY OF NORTH CAROLINA 


ables the immediate determination of 


A method is presented that en t 
tion in the plus-plus cell for median-cut 


tetrachoric r from a table if the propor 
variables is known. 


There is an ever-increasing use of factor analysis, cluster analysis, and 
related techniques in psychological research. Since numerous coefficients 
of correlation are required for the matrices, many investigators have employed 
tetrachoric r’s and have utilized various short-cut methods for obtaining 
these coefficients. This seems to be especially prevalent in preliminary 
investigations where the greater exactitude of more time-consuming methods 
of determining correlation is not feasible. 

In many cases continuous distributions are dichotomized; it is often 
Possible to make the cuts at the medians. The writer was able to divide at 
the median 24 of the 26 variables employed in a recent problem. To facilitate 
the determination of tetrachoric r а table was prepared so that the coefficient 
Could be determined immediately if the proportion in the plus-plus cell were 


known (Table 1). к 
The table was prepared by using th 4 
Saffir, and Thurstone (1). To use these diagrams data are 


fourfold table as follows: 


e computing diagrams of Chesire, 
arranged in a 


a 
= 50 the value of с corresponding to а particular 
d by noting where the г... curves from .10 through 
95 eut the vertical line for p = .50. The proportions for 1.00 and .00 are, of 
Course, known. These twelve points then described a wd with Tee from 
:00 to 1.00 on the ordinate and proportions from 50 to .25 оп the abscissa. 

hese points were located on a large (26 by 30 inch) sheet of graph paper 
83 


From the diagram for a 
Value of r,,, was determine 
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TABLE 1 
Three-Place Tetrachoric r Corresponding to Proportion in Plus-Plus 


Cell for Median-Cut Variables 


Proportions 


CO0 — 001 002 003 00 005 006 007 008 009 


688 693 697 702 707 71 716 721 725 730 
639 69 651 659 бең 66 67h 678 683 
589 591 599 hy 6% 619 62} 69 63 


Decimal points properly preceding each entry have been eliminated, 


and a smooth curve drawn. From the graph the 250 three-place 7,,,'s for 
the corresponding proportions were determined. 


To insure accuracy a check was made by computing from formula the 


R 3 
proportion = .15915504 (ra + ы! + >ы) + .250. 


i е obtained graphically agreed to two places yi 
only rounding errors in the third place. Values above т, = .80 could 2° 


be checked by means of the shortened formula but this section of the curve 
was redrawn on a larger scale and the values checked. 
Table 1 is used in the following way: 


(1) when both variables are cut at the medians and arranged in ^ 
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fourfold table, determine the proportion to three places falling in the plus- 
plus cell; 

(2) this proportion is found in the ‘marginal entries of the table; 

(3) entering the table read off the f:e: to three places from the body of 
the table (see example A). 

(4) if the proportion in the plus-plus cell is less than .250, the т... will 
be negative. In this case use the proportion in the plus-minus cell (or .500 
minus the plus-plus proportion) and place a minus sign before the obtained 
Tet (see example В). 

Examples: 


> 7 33 100 
— 33 67 100 


67 
100 100 200 200 = .335 


At the intersection of row 330 and column 005 read off т, = .509 


40 _ 133, ЦО = 367 


150 150 300 300 = "138, 300 


column 007 read off т: = — .674 


At the intersection of row 360 and 
REFERENCE 
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AN IBM METHOD FOR COMPUTING INTRASERIAL 
CORRELATIONS* 


M. Carr PAYNE, Jn. 
AND 


LEONARD STAUGAS 


UNIVERSITY OF ILLINOIS 


А method for computing intraserial correlations using a 602-A Calculating 
Punch, an 077 Collator, a 513 Gang Punch, and a 403 Tabulator is described. 
An example of the use of the procedure and an estimate of the time needed with 
each machine are given. This procedure is compared with another method, 
which makes use of a more powerful IBM machine. 


Introduction 


In a recent article, Grant (3) described an experimental approach to 
behavior as a time series which new developments and adaptations of quanti- 
tative techniques have made possible. In the fields of psychophysics and motor 
performance, the new techniques (1, 2, 6) typically have dealt with binary 
data, e.g., did a subject see & light flash or not. A more generally useful tech- 
nique of time series analysis is one which uses continuous data. One such 
technique obtains correlations by calculating Pearson product-moment corre- 
lations within the series (intraserial correlations). An automatie method for 
computing these correlations with typical IBM equipment is described below. 


Computing intraserial correlations requires the pairing of measures which 
Were separated by any stated num original series. Lag x 


ber of measures in the 
(or z,) is the case in which an event is di ts from the one with 


splaced x even 
which it is paired in obtaining à serial corre correlation). To find 


lation (or auto! 
the correlation coefficient for lag 1 over & series of № measures, it is necessary 
to pair measure 1 with measure 2, 2 with 3, 3 with 4, ***; N — 1 with N. The 
n for computing the correlation is equal to N minus 


the lag number. If the 
correlations are to be obtained with IBM machines of the order of the 602-A 
t be so organize 


Calculating Punch, the data mus d that each score appears on 
the same punched card as the score with which it is to be paired in the correla- 
tion computation. The following procedure was worked out using the 602-A 
*This research was supported in part by the United States Air Force under Contract 
No. AF i d by the Air Force Personnel and Training Research 
Ue ror eA ово duction, translation, publication, use and disposal 


Center. Permission is granted for repro 
ìn whole and in part Ьу or for the United States Government. 
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Calculating Punch, the 077 Collator, the 513 Gang Punch, and the 403 
Tabulator. [A very interesting approach to this same problem, using a more 
powerful IBM calculator is described in Schipper and Gruenberger (5).] 


Outline of the Procedure 


In general terms, the procedure is as follows: The data are punched in 
conventional form across the card, several cards usually being needed to con- 
tain the measures for one series. Punching begins in a column which depends 
on the number of lags to be computed and the number of digits in each meas- 
ure. (See the boxed-in entries on the original cards in Fig. 1). In addition to 
the measures themselves, each card contains appropriate identification of the 
data. A certain number of blank cards depending on the location of the first 
punched column, and the number of digits per measure are now inserted 
behind every original data card (see “inserted cards” in Fig. 1). Measures are 
punched from the original card into these blank cards using the interspersed 
master-card gang-punching principle. As this gang-punching is carried out, 
each measure is offset to the left by one measure on each succeeding card. 
Thus, if each score contains two digits, a score in columns 23 and 24 on the 


workers, was described several years ago by Hartley (4). When the original 
cards and the inserted cards have been passed through the Gang Punch once, 
the first columns through the deck, depending on the number of digits per 


The next step is to construct a set of "answer" cards, each of which is 
eventually to contain the computations relating to one correlation. An answer 
i t esenting a series of measures. The 
whole deck is then passed through the 602-A Calculating Punch. The 602-A. 
is wired to accumulate the N, >; X, DE, Pu > Y7, and > XY for the 


in the series of measures, 


When all the correlation answer cards have been assembled they are put 
through the 602-A twice More, using two panels which compute the square of 
the Pearson product-moment Coefficient, using the raw score formula. In the 
process, the squared correlations are punched into available unused columns 


| 
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on the an X 
eee B е А answer cards are then merged with a table of cards 
E E values of r, and using the interspersed master-card 
sey fmi. ШШ E again, the proper r is found and punched directly 
n s. The answer cards are then ordered appropriately and 

e correlations is prepared on the Tabulator. n 


An Example 


МЕ. ат of the use of this procedure we may cite the intraserial 
ae, trees ian for lags 1 through 10 for some brightness matching 
ааа 50 ata were available as two-digit scale readings. They were 
a ud e per card, beginning in column 23. Suitable identification 
[IESU ро: in moment 63-67. Different series of the data varied in the num- 
rebel Ey ү for a series of 120 readings, for example, it was necessary 
Br e 3 = s. The first card of each series was identified with “X” 
inse 1 in erspersed master-card gang-punching. The Collator was used 

9 blank cards behind each original card except the last one of the 


series which was followed by 29. 
YN 3 through 62 of the punch brushes of the Gang Punch were ` 
ur В и into columns 1 through 60 of the punch magnets so as to 
Ош, ше off-set gang-punching from each card into the next. The columns 
aining identification were wired to gang-punch normally and the EPIX? 
g. The Gang Punch 


h : > 
ubs wired for interspersed master-card gang-punchin| 
hed them into positions two columns 


= the values on each card and punc 
e left on the succeeding card. By the time the first nineteen inserted cards 


h 
ad been punched, the data were displaced to the left far enough so that 
he second original card, were punched 


= 1 through 22 on the next card, t 
а * d values which serially preceded those in columns 23 through 62 on that 
rd (as in Fig. 1). This off-set punching was continued until the last two 


Т 
esponses of ће data were punched in columns 1, 2, 3; and 4 of the last card 


in the deck. 
sf Es all ihe cards had been passed through the Gang Punch, the first 11 
de ch series (the original punched card and the first ten inserted cards) were 
oved and discarded as they did not contain punches in columns 1 and 2. 
Позу: the first card (inserted card no. 
20th reading in the first 40 col- 
ond card contained readings 2 
last card contained the last two read- 
It was now possible to obtain lag 1 correlation 
for the X value and columns 3 and 4 for the 
res 1 and 2 were in these columns 
rd, etc. 
unched the N, sums, 
h correlation into an 


mee through the 
the first card, measure 
GS 602-A Calculating 

of squares, and sums О 


Punch calculated and р 
f cross products for eac 
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appropriately identified answer card that was inserted at the end of each 
= After all the correlation components had been obtained for a particular 
e dua file of cards was put through the Collator to remove the old answer 
= мезан answer cards for the next lag, and remove those score cards which 
did spe onger appropriate. These inappropriate score cards were those which 

contain punches in all the columns being considered for the new Y 
values. This calculating and collating process was repeated for each of ten 
lags. To check on the efficiency of the Collator, it was found to be worthwhile 
to examine visually and to count all rejected score cards. 

At this point, each answer card contained the components necessary for 
computing the correlation coefficient for a particular lag. These computations 
were carried out as described above, and the resulting 7’s and components Е 
were tabulated. 

Calculation Times 

The time needed on each machine to calculate 
ranging from 119 to 110 (120 original responses corre 
study used in this example is summarized below. 


280 correlations with N’s 
lated for 10 lags) in the 


Key Punch .5 hour 
Caleulator 26.5 hours 
Collator 3.0 hours 
Sorter 1.0 hour 
Tabulator .5 hour 

Gang Punch 3.0 hours 
Total time 34.5 hours 


Discussion 


re provides no machine check on results obtained 


d above, suspicious coefficients were recalculated 
and also as 2 check, a few non-suspicious coeffi- 
cients were randomly selected and calculated on & desk calculator. None of 
these checked coefficients or components Was found to be in error. The pro- 
cedure has the advantage that all of the components of each correlation 
Coefficient are punched into one “answer card,” which makes it easy to use 
these values in other calculations where they may be needed. | 

For data where each measure js known very precisely and contains a 
large number of significant digits, the procedure outlined by Schipper and 
Gruenberger (5) using а more powerful calculator is probably more desirable 
than the present one. However, in most psychological research the number of 
Significant digits in each measure is small. In this case the difference in time 
Per correlation between the two procedures does not warrant the use of the 
more high-powered calculator. The present proced 


ure, for reasons of ready 
Availability of the equipment needed, simplicity, and ease of understanding, 
18 probably the more satisfactory one for most PSYC 


This computing procedu 
at any stage. In the work cite 
through the entire procedure 


hological research. 
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A distinction is drawn between the method of princi 
i principal components 
formed by Hotelling and the common factor analysis ШЕРЕП in ed 
E 5 а Е both from the point of view of stochastic models involved 
problems of statistical inference. The appropriate statistical techniques 


акылду reviewed in the first case and detailed in the second. A new method 
bet nalysis called the canonical factor analysis, explaining the correlations 
ween rather than the variances of the measurements, is developed. This 

lutions to the maximum 


im furnishes one out of a number of possible во! 

Es 9 lihood equations of Lawley. It admits an iterative procedure for esti- 

i DE the factor loadings and also for constructing the likelihood criterion 

Es ul in testing a specified hypothesis on the number of factors and in 
etermining a lower confidence limit to the number of factors. 


1. Introduction 


A Whatever may be the arguments for or against factor analysis as & tool 
cee research, the statistical problems it involves have been of 
Twoi erable interest to the statistician mainly because of their complexity. 
"M" о important contributions on the statistical side are by Hotelling (8), who 
E ad the principal component analysis, and Lawley (11, 12), who pro- 
M ed a test criterion for judging the significance of factors in addition to 
и out the maximum-likelihood equations of estimation. These two 

ors were, however, considering two different problems, both of which 


Seem to have important application. They are sometimes considered as two 
blem providing the same answer. In 


theory it helps to make a distinction between the two. The term principal 
Hotelling’s formulation of the 


Component analysis (PCA) should be used for 

Problem and its solution; the term factor analysis should be used for the 

Specialized formulation considered in psychological literature and for the 
was considering the latter 


Various solutions offered (see also 10). Lawley 
Problem under the assumption that the variables (test scores) are normally 


distributed. 

re Illustrations have appeared fr 

в the same relative magnitu 

а of factor analysis. This is true only 
mmunalities are very nearly equal for all the tests as sho 

Of this Paper. 


om time to time to show that PCA gives 


des of factor loadings as any effective 
when what have been termed as 
wn in section 3.1 
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The PCA is sometimes modified (3, p. 114; 7) by the insertion of com- 
munalities in the diagonal of the correlation matrix. This method, called the 
principal factor analysis (PFA), seems to provide a valid approach to the 
problem of factor analysis; however, it carries with it the flavor of principal 


explains most effectively the correlations between the test scores in a battery. 


This method may be called a canonical factor analysis (CFA). Formulas for 
estimation are detailed in section 4. 


test for factor analysis is given by Lawley (11). 
Lawley’s test yielding slightly more precise results 1. 
is also shown (section 4.3) that the test criterion 
the process of estimation and used in obtaining a 
the number of factors. 

Recently Bartlett (1, 2) proposed a test inv 
the correlation matrix intended to study 
to the variance of the measurements.” 


can be calculated during 
lower confidence limit to 


ed here in examining which of 


the methods, component or factor analysis, is relevant in problems of psycho- 


logical research or Whether both methods 


2. Problems of Factor and Co: 
2.1 F'actor Analysis 


mponent Analyses 
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what are called common and specific or individual factors. If z; , +++ , % 
denote p different measurements on an individual, then z; is written 
u: = 2: +S: @ = d, == 2), (2.1.1) 


where 2; , the variables depending on common factors, and s; , the variables 
depending on specifie factors, satisfy the following conditions of zero 
covariance: 

соу (z: , 8) = 0, соу (z; , 3) = 0, cov (8,8) = 0 @#)). (2.1.2) 


Sometimes, another independent variable representing unreliability in the 
measurement т; is added to (e; + 5;), but for purposes of factor analysis 
based on unrepeated test scores of individuals, this variable can be combined 
with s; without loss of generality. If such repeated test scores are available, 
then a more comprehensive analysis of the common and specific factors is 
possible. This latter analysis is not considered here. 

From the structural setup (2.1.1), (2.1.2) it follows 


Үш) = Vee.) + V (s;) 
ca = Ти + ё: . 
соу (a; , хх) = cov (e; ,2) + cov (2: ,8) + cov (8; 2) + cov (s; , 5) 
= соу (zi , 21) 
Tij = Yii @ z 3). 
If Z, T, A denote the dispersion matrices of the vector variables 7, 2, $, then 
z=T +A, 
where A is a diagonal matrix. К 
It is seen from the above analysis that any correlation between т; , t; 18 


solely due to the correlation between 2; › 2; - What we can actually observe 
are the values of the variables z on а group of individuals but not 2, 8 which 
ical existence is postulated. 


are not operationally і à d 
We thus obtain an estimate of the matrix 3. The subject of factor analysis 
is mainly concerned with the estimation of the matrix Г starting with an 

trix T satisfying the condition 


estimate of 2. The object is not to find any ma t t 
Z = T A but the one which has the least complexity leading to a par- 
simonious description of the relationships between the observable variables 
z. The complexity, when defined as the rank of the matrix Г, has a special 
e problems on which this technique 1s applied, as shown 1n 


significance for th | 
the subsequent sections of this paper. 
Some of the statistical problems 
(a) to estimate the minimum ral 
and covariances) Г of the variables a , *^* 


equations (2.1.1, 2.1.2), 


of factor analysis are: 


nk of the dispersion matrix (variances 
non occurring in the structural 
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(b) to test any hypothesis specifying the minimum rank of 1t 
(c) to estimate a basis of the common factor Space (defined below), 


Said to be independent if no 
(all a; = 0 simultaneously) 


factors. If Z, yore 
terms of Z, . 


Zi = a4Z, +... ай Я 
The covariance of 7; with 2, is a, 
(2.1.8). This may be regarded a 


relations 
а a, 
› (2.1.4) 
а Qpr, 


which is also called the factor 1 


because the choice of Z, , or the represent: 
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ne basis is as good as any other. Of course, 


tests. From this point of view О 
derived by a linear 


given one basis, orthogonal or oblique, the other can be 


transformation. 
The choice of a suitable basis which is “psychologically meaningful” 


has largely rested with the psychologist, perhaps rightly so. But it is quite 
conceivable, once the psychological meaning is translated to mean some 
precisely stated restrictions on the basis, that its choice will turn out to be à 
problem of statistical estimation. In this sense the graphical methods of 
rotation of factor loadings advocated by Thurstone (17) and the quadrimax 
method of Neuhaus and Wrigley (13) are statistical methods of factor analysis, 
where the number of zero or small loadings is maximized. Such a restriction, 
or even the orthogonality of a basis, may not be the most helpful in leading 
to a suitable psychological interpretation or the discovery of “real entities.” 


The choice of the restrictions to be imposed on the basis is perhaps a problem 
for psychological research. An objective method developed in this connection 
ibilities from а statistical 


by Cattell (4, 5) seems to have interesting poss! 
point of view. 


2.2. Principal Components 

It was pointed out by Sir Cyril Burt that 
put forward by Karl Pearson in 1901. But the statistical р 
tion and testing connected with the principal componen 


sidered by Hotelling. 

Hotelling (8) considers two types of problems: 

First, without assuming 2 decomposition of : 
(2.1.1), hypotheses are framed in terms of the latent roots of the correlation 
matrix with a view to studying the shape of the scatter of the standardized 
scores in the p-dimensional space or alternatively the relative importance of 
the different principal components in explaining the total variance. For 
instance, if some of the calculated roots are not significantly different then 
the components corresponding to them may be considered equally important. 

Let us consider а specific hypothesis that the (i + 1)th to the (i + nth 
roots of the population correlation matrix (denoted by p) are equal. The value 


.. to (p — n 


this method was originally 
roblems of estima- 
ts were first, con- 


the measurements as in 


of i may Бе 0, 1, ° | | | 
This hypothesis imposes à restriction on p, VIZ» that it admits the 

decomposition 
(2.2.1) 


p= 0+1, 


where @ is a matrix of rank (p — 75 ^ is the cc 
G+ 1)th to (i + r)th, and Т is the unit matrix. | ; . 

is м hypothesis or a similar one based on the dispersion matrix 
Z instead of p can be tested by & likelihood-ratio criterion A, provided sample 
size is moderately large- Exactly how large the sample size should be is & 


ommon value of the roots from 
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matter for further investigation. The statistic (— 2 log. A) is distributed in 
large samples as x” with degrees of freedom equal to the number of restrictions 
on the free parameters imposed by the hypothesis. 


The number of restrictions in a hypothesis of the form (2.2.1) is equal 


dependent; the rest of the elements depend on them. Therefore 


, the number 
of restrictions on the elements of 0 is 
(p—7(»—r4- 1)/2, 
and with one less we have 
Фф -т— Дур ж. 2)/2 (2.2.2) 


degrees of freedom for the x? approximation. 
If A is the estimated dispersion matrix from ob 


viduals, then the test criterion for the hypothesis (2. 
pis 


servations on a indi- 
2.1) with x instead of 


Ed c Га 
(п MEET 


(2.2.3) 
where | $ | is the estimated dispersion matrix under the conditions of the 
hypothesis. The latent, roots of $ 

Hi y Ma, -.. s Mi y Mizi y M Risg s Mitri y cee s Ир 
are connected with the latent roots of A in the following way: 
М = ш ыле сузы Y. DP), 
LE Aves + ее + Nise 
۳ 
Since 
[AT 2x, Ма esu 
the ratio | A |/| $ | is 
Ana E) YR 
WE E Ange | 
CE $24) 
poe = 
which is a suitable Power of the ratio of the geometric to the arithmetic mean 
of the (¢ + 1 to @ + r)th roo of A. From this point of view it would 
appear, by choosing û = (р — ғ) і 


| $ (1) test using the 
ations is valid for jud ing the signi ce 
of equality of the least r roots. Mi e 


ee ای‎ 
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Unfortunately the test does not seem to reduce to the form (2.2.4) in 
terms of the roots of the observed correlation matrix R when the hypothesis 
is as stated in (2.2.1) in terms of the population correlation matrix p. The 
effect of standardizing the variables by the sample standard deviations is not 
properly allowed for by a criterion of the form (2.2.4). This is also partly 
revealed by Bartlett’s own evaluation of the degrees of freedom by the ex- 
pectation method in a simple case. They depend on the unknown correlations 
and reach the value (2.2.2) only in a limiting case, while for a genuine likeli- 
hood ratio this is not expected. The exact evaluation of the test criterion 
depends on complicated equations which require further investigation. 

Secondly, Hotelling considers the problem of “testing the variances of 
components against the variance to be expected on account of the inaccuracy 
of the tests as revealed by their self-correlations or reliability coefficients.” 
For this purpose a test score is thought of as made up of two parts, a true 
score with variance unity and a random error. Thus 


a = Xi + є G= 15 2, (2.2.5) 


= 0, (i =). The hypothesis stated above is 


with the conditions cov(e; , €;) p ‹ 
X; are linearly dependent, i.e., 


interpreted to imply that the true scores 
“the scatter diagram of the true scores will lie in a flat space of smaller 


dimensionality immersed in the p-dimensional space.” If independent esti- 
mates of the variances of в; are available, either from an external source or 
by re-tests on individuals, there is no need to consider the true scores as 
random variables in order to test the above hypothesis. The general multi- 
variate tests of dimensionality developed in more complicated situations 
are directly applicable for this problem. The non-stochastic model on the 
scores corrected for unreliabilities used in testing the second hypothesis 
provides a strong contrast to tests in factor analysis where, of necessity, all 
the variables (the common and specific factors) involved are considered to 


be stochastic, which makes the problem more complex. 


3. Special Characterizations of a Basis in Factor Analysis 


Using the vector notation 


= (ta, 20) z=, ,2), a= و و‎ 
The equation (2.1.1) can be written 
$=2+ 83. (3.1) 
The dispersion matrix of x (using D for dispersion) is 
D(z) = р + D(9, 
E (3.2) 


2er4dA, 
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where X, Г and A are defined by equation (3.2). The covariance of zand s is 
zero because of conditions (2.1.2). The matrix Г is positive semi-definite with 
rank k < p and A is a positive-definite diagonal matrix. The equation (3.2) 
supplies the fundamental decomposition of the dispersion matrix = in terms 
of those of the hypothetical variables postulated by a factorial structure. If 
the rank of Г is k < p, the space of common factors has a basis of k inde- 
pendent factors as shown in section 2.1. For a proper identification of the 
space and “an orderly selection of independent factors” there is a need to 
characterize a basis in a convenient way. A basis so characterized need not 
admit a psychological interpretation, for only mathematical and statistical 
convenience is being sought at this stage. A basis once obtained can always 


be transformed to meet other requirements. Two special characterizations 
are discussed here. 


3.1 First Characterization 


Letl-(,-... 


, ) be a vector of arbitrary coefficients giving rise to à 
new factor variable 


Iz = lz ++ м... 
The variation in the variable т; explained by the factor variable 1 
соу? (a; ‚ iz’) - (hy + +++ + Шы) 
Vg) uy 
assuming that 1 T Г, the var 
plained in all the variables is 


lı li kic y 
{5 йа + ts) irr’ 
irl’ = Iri ы (3.1.2) 


‚18 


, (3.1.1) 


lance of 12”, is not zero. The total variation ex- 


Let us choose 1 such that (3.1.2) is a maximum. Differentiating with respect 
to the vector l (see 14, p. 21 ; the equation leading to the optimum value А 
of the ratio (3.1.2) and the vector 1 Tis 


Wr = xr =. 0 
or eliminating IT 

IT — | =0, (3.1.3) 
where Г is the identity matrix. This shows that А is the maximum latent 


root of Г and m = IT is the latent vector corresponding to it. Since the 
vector m satisfies the equation 


mI = Xm, 
and 


m = IP 
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so that 

тг = NI, (3.1.4) 
the vector т itself can be taken to be a solution of 1. We thus obtain the first 
factor variable as a linear combination of 21, °’ › 2 ° From the theory of 


canonical roots and vectors (14, p- 24), it would then follow that the second 
factor variable, which explains the highest proportion of the residual varia- 


tion independently of the first, is the linear combination corresponding 


to the second canonical vector. There are as many linear combinations as 


there are non-zero roots A, which is equal to the rank of the matrix T. The 
linear combinations of zi , °’ › ?» supplied by the canonical vectors of zero 
roots of А vanish identically, indicating the dependence of the factor variables 
associated with the measurements 3; ^" » 2» ° 

The factor loading of the variable x; on th 


the correlation between the two. The covariance is 


cov (x: , 2) = tins “Fon + Lisi = Xl, 
en to have unit variance the correla- 


e first factor chosen above is 


and if the variables x; are initially chos 
tion is . 


V li 815 


Mli xl 


Ро ИР Vat ns d 


The factor loadings are then the elements of the first canonical vector suitably 
standardized. Similarly the factor loadings of any other factor are derived 
from the canonical vec 

Even after exhausting all th 


remains some variation 
the number of independent common 
originally considered by Hotelling, the successiv 
variation in z were not confined to the common fact 


functions of the specific factors 8; which then are equiv: a 
therefore, important in problems 


of т. Hotelling's principal components are, e 
where the total variation of а measurement vector x 15 SOU c 1 
for, to the maximum amount possible, by a smaller number of linear functions 
of x. The principal components of Hotelling are derive 


d from the latent vectors 
of the matrix 2 = г+ 2 instead of Г alone as used above. It may be observed 
that when 


е independent factor variables, there still 
the specific factors unless 
factors is equal to р. In the problem 
e components explaining 


Aash 


cific variables have the same variance ё, а latent vector 


quation 


yr + #0 = 61 


ie., when all the spe 
lof P + A satisfying the е 
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also satisfies the equation 


IT = (и— 81 = м 


and is therefore a latent vector of Г. The principal component analysis of 
Hotelling is thus a method of factor analysis with the factor loadings inflated 


keeping the same relative magnitudes, when all the specific variances are the 
same. 


There is some arbitrarin 
set of factors because instead 
plained in z, , ... , 1 
different basis and consequently a different set; 
variables x, , ... `, x, are cho: i 
is equivalent to using reciprocals of total variance 

The quantity ô: , the residual variance of т, 


unexplained by the factor 
variables, satisfies the equation 
Mili ‚ Mm? 
о ^ш tu DE 5 5:, (3.1.6) 


defining the factors Je, mz’, --- and the 
in the vectors, 
a factor variable Such аз Jz’ from the 


by the method of regression (16). If 
ue, then £ satisfies the equation 


subscript 7 relates to the ith element 

The best formula for predicting 
observed measurements т is obtained 
kz’ is the predicted va] 


3.2 Second Characterization 


Instead of aski 
tion as Possible of 


correlation (or its Square) between th 


(ir E 
(1 (42$) (3.2.1) 


————— 


É — 
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has to be maximized. Using the algebra developed in a similar genetic prob- 
lem (14), the optimum value of the correlation » is found to be a root of the 
equation 


|r—-»z|-90, (3.2.2) 
or 
|5 -лА|= 0, A= 1/0 = »). (3.2.3) 
The vectors J and 4 are proportional and satisfy the same equation 
K—-3A920, q2-39-0. (3.2.4) 


That factor variable which is highly correlated with g is Jz’, where 1 is 
the latent vector corresponding to the largest root of the determinental 
equation (3.2.2). The second factor variable, uncorrelated with the first and 
possessing the highest correlation with z is mz’, where m is the latent vector 
corresponding to the second root of (3.2.2), and so on. We get as many 
factors as the number of non-zero values of »^ or values of А greater than 
unity which is the same as the rank of Г. 

For any factor Jz’ as determined above 


ir = ey = += By = 0 — DIAY 


cov (a; , ) = nit + lives = ( = ШЕЯ. 


‘The correlation between 2: and Jz’, 
a3 
Q = раё (3.2.5) 


_ © -= DLS 
Vix = IA ai: 


on the factor 1". This is again an element of 1 
be shown that the same factor loadings are 
‚ т) ме consider (слу, ^*^ > ¢,t,) with the 
variables arbitrarily scaled. In the previous case it is necessary to reduce the 
variables (x1 , °°" > x) to unit standard deviation before proceeding to 
derive factors in order to achieve uniqueness of factor loadings. 


ict the factor measurements we use the regression equation as 
wer . , defined by the latent 


/ 
in (3.1.7). In this case it turns out that lz’, тё, °° he 
"vectors ү (3.2.8), сап be best predicted by Ix’, шт, “°С, avoiding the 
.complication of multiplication by 57° necessary in the case of factors defined 


i i izati i TO: 
in the earlier characterization of the basic set (3 2 í 
The residual variance 8? in 2; unexplained by the factor variables satisfies 


the equation 


is the factor loading of 2: 
multiplied by а constant. It can 
obtained if instead of (21, ° 


Dope, B26) 
aA 
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similar to the formula (3.1.6) in the earlier case. This second characterization 
of a basis together with methods of estimation may be called canonical 
factor analysis (CFA) to bring out its connection with the theory of canonical 
correlations. 

Factor analysis thus fits in a general theory of canonical correlations 
involving two sets of variables: one set being observable and the other set, 
observable as in multiple regression; dummy as in multiple discrimination А 
or hypothetical as іп problems of genetic selection. 


3.3 Which Is a Better Characterization? 


purpose. 

But this is no longer true when we have only estimates of the dispersion 
elements and factors are estimated by formally substituting for X the esti- 
mated quantities and choosing A to satisfy the equation (3.1.6) in the first 
case (PFA) and (3.2.6) in the second case (CFA). Which then is a better 
estimate of a basis? 


From the point of view of statistical estimation, PFA gives a least- 


from a description of their measurement. 


There is another logical argument which may have to be borne in mind 


neerning the number of in- 
and a test of this null hypo- 
а serious departure. If then 


u cm کے‎ 
а — ج‎ 
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(given number not necessarily exhaustive) factors explaining the maximum 
possible variance in the measurements while CFA, the best k factors which 
have in some sense highest possible correlations with the measurements. 
This may mean that while the first set attempts to explain as much as possible 
of the variations in the individual measurements, the latter set focuses on the 
correlations. Perhaps the psychological interest chiefly lies in the latter set, 
which offers a better explanation of the correlations between the measure- 


ments. 


м -— 
4. Estimation and Tests of Significance for Factors 


4.1 Estimation of Factor Loadings 

Let A = (а) denote the observed dispersion matrix of the vector 
variable x. This is sufficient for the estimation of Г and A, the two components 
of the population dispersion matrix E. Following the equations (3.2.3, 3.2.4) 
of the second characterization, we have on substituting A for = 


| A = лА |= 0, 
KA — M) = 0, 


onding to the latent root A. From the point 
venient to solve for 


(4.1.1) 


where 1 is a latent vector corresp' 
of view of mechanical computations it is con 
b= a, bb’ = 1, (4.1.2) 


in which case is the latent vector of 
| Ant? AA? — М | = 0. (4.1.8) 


i i у Чч ing two factors. 
Let for the sake of illustration, that we are extracting 
Ifb Е po» : b,) and ¢ = (Cı, 7^: c,) are the first two latent vectors of 
Y Pm = . H 
(4.1.3) corresponding to the roots №, and № , then the equation (3.2.6) gives 


а= {Qu = ри + (№ = Dc T 1]à; , (4.1.4) 


or 
"= a — = "8 (4.1.5) 
й -д + б. 0+1 € 
‘on (4.1.5). The equation (4.1.3) 
wł ` ig defined by the last part of equation ( j c 
mes pend in terms of the observed correlation]matrix 2 instead of 
the dispersion matrix A 
| GRG — M | = 0, (4.1.6) 


where the elements g: of the diagonal matrix G satisfy 


= Мм - ПО, — 06 + 1. 


(4.1.7) 
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The computational problem is then to solve for 9:3 satisfying the equations 
(4.1.6) and (4.1.7), where А, , А» ‚ are the latent roots of (4.1.6) and b, c, the 
latent vectors. A tentative method is to start with a trial matrix G and obtain 
successive approximations by solving (4.1.6) for A, , X; , and b, c, and sub- 
stituting in (4.1.7). The process is repeated until the g; converge. 

A better approximation to g: is obtained by using the formula 


AAU. eem fu a 
EE n 
where 
2 
х= рем, F (4.1.9) 


the summation [Zg?], refers to the gi 
(4.1.7) to obtain X , X. 

The two formulas (4.1.7) and (4.1.8 
stages when convergence is expect 
(4.1.8) may accelerate convergence, 

The estimated factor loadi 
stage of approximation are 


Vh = 16", М», — 16G^7., 


at the previous stage used in equation 


) should agree towards the final 
ed to be slow. But in the initial stages 


ngs on the first and second factors at any 


one out of a number of possible solutions. Th 


designated by “h, jh, m 
communalities at the first stage. 


4.2 Tests of Significance and Estimation of Number of Factors 


It is also necessary 
of factors to be estimated. 
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procedure. This is no doubt an objective rule for determining the lower limit 
to the number of factors, but in practice it may be better to extract one or 
two more factors, depending on the magnitude of the residual roots. If one 
or two such roots are sufficiently bigger than unity (though not significantly 
so) it may be worth while to extract the factor corresponding to them also. 

The hypothesis we propose to test is that the population dispersion matrix 
admits the decomposition 


=ЕГ-А, (4.2.1) 


where A is a diagonal matrix with positive terms and Г is a positive semi- 
definite matrix of rank k < p. 

The test criterion we use is derived by the principle of likelihood ratio, 
assuming that the observations are normally distributed. 

The exact distribution of the test criterion is not known but in large 
samples (— 2 log) of the likelihood ratio is distributed as x’ with degrees of 


freedom equal to the number of independent restrictions on the elements of 


= imposed by the hypothesis (4.2.1). This hypothesis specifies the rank of 
z is k, then by fixing the 


the matrix 5 — A for suitably chosen A. If its rank 
first k rows and columns the rest of the elements can be computed, which 
implies (p — k) (p — k + 1)/2 restrictions. Allowing for p unknown values 


in A, the number of restrictions is equal to 


(p— Bp ED Ба @— 0-р 6, (4.2.2) 


The test based on the likelihood-ratio criterion is 
[4 | 

= 4.2.3 
151’ ( ) 
using the maximum-likelihood 
ficient (n — 1), where n is the 
priate value for the х? approxi- 


—(n — 1) log 


where $ is the estimated dispersion matrix 
equations of section 4.1. The multiplying сое 
sample size, may be replaced by the more appro 
mation to hold when n is not large, 
2р +5 _ 2) 
(s We ES 3^ 
s the number of factors (1). Since 
ding to k factors are 
(4.1.6), it follows 


where p is the number of variables and Ё i 
the roots of the equation | = — A | = 0 correspon 
estimated by | 4 ~۸4 | = 0 or the equivalent forms (4.1.3), 


that the roots of | $ — 441 = 0 are 
№ дее Б a (4.2.4) 


while the roots of | 4 — ^4 | are, in descending order of magnitude, 


р Е prea nes” y Ap (4.2.5) 
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and, therefore, 
و ا کک‎ (4.2.6) 


which is the product of the least (p — k) roots, at the last stage of iteration of 
the equation (4.1.6), | GRG — AI | = 0. 
The x’ test is 


—(m — 1) log (был. +++] (4.2.7) 


with [(p — k) — p — k]/2 degrees of freedom apart from the slight refine- 
ment in the multiplying coefficient. 


4.3 A Modified Criterion and Its Practical Use 


—@ — 1) ов Qua ++ X) — (p — k) log ^.], (4.3.1) 


° , A, . [Strangely the 
converge ultimately 
liklihood ratio), resembles Bartlett's (1) 


that Bartlett’s ratio is an initial a 
(4.3.1) converges to 


—(n — 1) log Qua +++ А) 
at the final stage when 


@-H) mater da, (4.3.2) 
Suppose that (4.3.1) is not si کے ر ا‎ 0 ee 
of freedom, at any stage, ) — p — k]/2 deg 


is reached even after 
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of the analysis is to estimate the number of factors (lower confidence value) 
as well as the factor loadings. 

Before proceeding with the cycle of operations for estimation, let us fix 
some high value of k as the number of factors and calculate the roots after 
one or two iterations. At this stage, find that value of r for which A, (with 
d.f. [(p — r)? — p — r]/2) is not significant, but A, is. This shows that the 
number of factors is not greater than r. We may set the number of factors 
provisionally at r and continue the process of estimation. Each time we may 
calculate A,., and A, to see whether A,-ı becomes not significant at any 
stage. If it is not significant, there is a case for switching over to (r — 1) 
factors instead of r. 

5. Summary 

The experimental situation and the nature of the data on which the 
technique of factor analysis can be successfully employed may be stated as 
follows. Each of the р measurements on an individual has a linear regression 
on a common set of a few hypothetical variables or factors: The deviations 
from regression for any two measurements are uncorrelated. The factor 
analysis seeks the smallest number of independent hypothetical variables 
necessary to explain the intercorrelations between the measurements. 

If R is the observed correlation matrix, the computational problem of 
factor analysis depends on the solution of the diagonal matrix G satisfying 


the equations 
(5.1) 


|GRG = M | = 0, 
(5.2) 


[Qi = 1 а; does + (^ = Dai + e, 
umed, №, °°" s M, are the first К largest 


) is the latent vector corresponding to 
the equations (5.1, 5.2), then the 


gi = 


where k is the number of factors ass 
roots of (5.1) and a; = Gina °°" 9 ау, 
the root А; . Once G is found to satisfy 
factor loadings are given by 

(A; = Da;G ^" (7 asl. , k), 


at k factors are adequate to explain the 


and the test of the hypothesis th 
intercorrelations is 

2 2 —(n — 1) log. Qva ^ N) 

The lower confidence limit 
for which x^ is not sig- 


x 


= p — Ё/? degrees of freedom. 


with [(p — 0? 
actors is the smallest value of k 


to the number of f: 
nificant. 

Some rese: 
technique for solving 
present is to guess suita 


find an elegant computational 
5.2). The method available at 
stitute in (5.1) and obtain better 


arch remains to be done to 
the equations (5.1, 
ble values of g; , sub 
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approximations to g; by using (5.2). This process is continued until m 
is secured. Unfortunately this appears to be a slow process unless the initia 
values of g; are very near the true values. Even with a good set of trial values 
the problem can be best tackled only on an electronic computer when large 
numbers of variables are involved. A suitable program for Illiac is being 
written by Mr. Golub of the Digital Computer Laboratory at the University 
of Illinois. A numerical example solved on a tentative program is reported 
below. Full details will be presented soon. 


First it may be noted that the relation between g; and the communality 
hi for the ith variate is 


g;—1/Vl—h, 


mation g; — 1/2. 

Second, although the test involves the product of the roots at the final 
stages of convergence, it is useful to compute at intermediate stages the 
statistic 


x =-(n- 1)[05, (Area ЗЗА) – (р — № log, Qua Tcr ^,)], 


which, when not significant, implies the nonsignificance of the ulti 
We could stop at any stage after this, provided f 
considerably alter the factor loadings. 

The following correlation matrix was presented by Davis (6) in an 
attempt to study factors of comprehension in reading. 


mate x". 
urther iterations do not 


1.00 
72 1.00 
41 84 1.00 
28 36 16 1.00 
52 53 34 30 1.00 
71 Wik 43 36 64 1.00 
68 .68 42 35 55 76 1.00 
51 52 .28 29 


. Е Ho и = 59: го 
68 68 а 36 


55 76 68 58 100 
Assuming a single factor, th 


e х? was calculated and found to be significant. 
This indicated more than one factor. Under the hypothesis of two factors 
the value of x? 


—@ — 1) ов (№ --- — КУ 2) Шор Oa ate №)] 
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came down to 29.73 at an early stage of iteration. This being less than 30.1 
the 5 per cent significance value of x? with 19 degrees of freedom, the hypoth- 
esis of two factors stands unrejected. So the data admit an interpretation 
in terms of two significant factors only. A fairly stablilized set of factor 
loadings are 


Factor 1 845 817 .477 401 .669 .891 834 .651 .833 
Factor 2 —.300 —.084 .012 .153 161 .145 .081 .122 `.080 


I wish to thank Dr. C. F. Wrigley, who read the manuscript and offered 
some helpful comments. 
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RELIABILITY FORMULAS FOR NONCOMPLETED OR 
SPEEDED TESTS* 


Lovis GUTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


New formulas are developed to give lower bounds to the reliability 
of a test, whether or not all respondents attempt all items. The formulas 
apply in particular, then, to completed tests, pure speed tests, pure power 
tests, and any mixture of speed and power. For the сазе of completed tests, 
the formulas give the same answer as certain standard ones; for noncompleted 
tests the formulas give a correct answer where previous standard formulas are 
inappropriat The formulas hold both in the sense of retest reliability ап 


of parallel tests. 


I. Introduction 


Recently, there has been increasing awareness of an important inade- 
for studying reliability 


quacy of all standard formulas that are in current use 
of tests. These formulas are not appropriate for tests in which all items are 
not attempted by everybody. In particular, they do not hold for speeded 


tests (cf. 1). 


The present paper proposes а new analysis of the problem, and provides 


some practical formulas that hold whether or not the tests are completed. 
The case of completed tests emerges 25 a specialization of the present analysis. 
Thus, formulas developed here hold for pure speed tests, pure power tests, 
and for tests which are partly speed and partly power. 

An important example of one of the practical formulas developed here 
is as follows. Consider а test composed of m dichotomous items. Each item 
is scored unity if answered correctly, zero if answered incorrectly or not 
attempted. Each person's total score is the sum of his scores on the т items. 
Suppose the test is administered once to a large population of individuals 
and that there are m — 7 items each of which has zero variance in its scores; 
that is, on each of these m — n items either all people scored 0 or all scored 1. 
In particular, all items not attempted by anybody are in this subset of 
т — п. The n items with positive variance will have their statistics on the 
single trial denoted as follows: 


т; = proportion of the population that ans 


е rat n 
of the population th 
regardless of wh 


wered the jth item correctly 


р; = roportion at attempted the jth item 
: "LEE n) ether the answer was correct 

| 3 3 
or not. 


*This research wa: 
the Behavioral Sciences 


s facilitated by an uncommitted grant-in-aid to the writer from 


Division of the Ford Foundation. 
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Let s; denote the variance of total scores on the trial. It is immaterial whether 
s? is computed from all m items or only from the n of positive variance, 
since adding items with zero variance will not change s; . Let D' be defined by 


D= а-а), @ 


and let L; be defined as 


= = 2 
Fera a @) 
Then, if р; denotes the reliability coefficient for the total test scores, we 

prove below that Lé is а lower bound to p; , ie, 
L Sp <1, (3) 
Another lower bound to pi derived below can be designated by Li’ . 

To compute it, first compute D" by 

р” =2 3 (vi =p; 30 va), (4) 

i= g^jl 


1 oncompleted— 
ts, they can yield very low bounds L/ and Lj'—even 


. In Part, this can be due to the greater room for 
? compared with completed tests, 


can be interpreted from the poi i i 
ЕЧ point of view either of retest 
reliability or of parallel tests; the same lower bound (2) ensues in each case 
(cf. 3). The same freedom of Interpretation holds for all the formulas in 


the present Paper, since we restrict ourselves to but a Single trial for the 
actual numerical] computations, 


To establish that 14 is а lower b 
of a previous paper (3), 
and inequalities for p? are 


67 Pound to р? ‚ we begin by using the results 
wherein some important fundamental tautologies 
developed that make no assumptions whatsoever- 


ы Эч — 2 RA. 
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For practical use, a certain quantity denoted there by 6 must be observable 
or at least bounded from above. The contribution of the present paper is 
essentially to establish upper bounds to 6 that are observable from a single 
by: given a certain assumption discussed below. The quantities D' and 
D” defined in (1) and (4) are such bounds to à for the type of test described 
above, and L; and Lj’ are the resulting modifications (as computed from a 
single trial) of the lower bound to р? denoted by А in (3). 

Notice that if everybody attempts all items in the test just discussed, 
so that p; = 1 (j = 1,2, °°" n), then the radicals in the right of (1) and (4) 
vanish for all j, making D’ = D" = 0. In such a case, according to (2), 
both L; and Li’ become the same as the lower bound Ls discussed in (2), 
or the usual lower bound for the case of completed tests wherein all items 


are experimentally independent. 


узш — =) 
n {= i=l x (D' = D” = 0). (5) 


8; 
L; is algebraically the same as formulas 


deduced in other contexts by Kuder and Richardson and by Hoyt, but 
d use in (2) are quite different from those 


its derivation, interpretation, an 

of the others by virtue of the differing contexts. The context in which La 
was originally derived is а special case of the context of L4 and Ls’ or of the 
present paper. It is not clear at present whether the Kuder-Richardson or 
the Hoyt formulations can be readily extended to the problem of noncom- 
pleted or speeded tests.] L; and Lj' are more general than Ls in that they 
allow for possible experimental dependence among the items due to non- 


completion of the test. 

Other lower bounds to рї which allow 
ence among items are also developed here, 
which the items are not dichotomies, or are ү h 

As one of the referees of this paper has pointed out, it may be desirable 
also to estimate the total error variance itself and not just p; . The variance 
in question is that denoted by é in 3, p. 229. Error variances will in general 
vary from individual to individual, especially in speeded tests, and е is 
their mean over all individuals. The lower xe pi are ve ue 
i 2 py virtue of the relations р: ё = о!(1— pr), W ere 
ae ee ШЕ eee 2(1 — L), where L is any lower bound to Pis 


c? is estimated by 5; . Thus E 
II. Notation 


For the proofs, the notation of the previous paper (3) will be followed. 
A slight modification here is that m denotes the number of items, or part- 
Scores, in the test, while n denotes the number of part-scores with observed 
variance greater than zero. Only these n actually variable part-scores affect 


Li! = Li = L; 


[As has been pointed out in (2), 


for possible experimental depend- 
as well as bounds for tests in 
not scored with 0-1 weights. 
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the reliability of the total scores, and we get more efficient lower bounds by 
i in pl ger ber m. 
using n in place of the larger num | | 
We consider here only the case where n 2 2, since we wish largely to 
restrict ourselves to information about reliability that is obtainable from an 
internal analysis of but one trial of the test. This requires that the test have 
at least two subscores (m = 2), and in particular that at least two subscores 
each have a variance greater than zero (n = 2). In what follows, we assume 
that items of the test with positive observed variance are the only ones being 
considered. | f B 
Let 2,;, be the score of person i on the jth item of the test (with positive 


variance) on trial k (j = 1,2, ... , n), and let t, be the sum of the n part- 
Scores 


tn = 27 Yi. (0) 
1-1 


The scoring scheme for any item can be arbitrary, except for the scores 
to be given to nonattempted items. We Shall assume that a nonatlempled 
item is given a score no higher than the lowest possible score for that item when 


attempted. Моге specifically, we shall assume that no negative scores are 
given to any item, and that non 


a scoring scheme originally allow for ne 


verted into the non-negative form we require by addition of a suitable constant 
to each part. This will not change the reliability coefficient p: in any way, 
nor any of the variances required in our formulas, A non-negative scoring 
scheme will yield total scores that correlate perfectly with those from the 
original scheme from which it is derived by this adding of constants, 


It should be clear that We are excluding from our present analysis the 
case where an incorrect answer to an item is scored lower than omitting that 
item. 


Let m; be the maximum score obtainable or 
is then that the Scoring scheme is in Such a for 


Osz 


n the jth item. Our assumption 
m that for all û, j, and k 
tik Som; , (7) 
and in particular that Жк = 0 if person 7 omits item j on trial К. 
The population of person: 


to be indefinitely large in ord 

Ultimately, only one trial fr 

provide empirical data for 
The expected values 


g is defined separately for each individual 7 
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о: = Е ть — Xi;Xi, . (9) 
k 


It has been shown how the reliability coefficient p; of the total test score, 
as well as lower bounds to p; , depend directly on the quantity à defined 
from the covariances in (9) by 

5 = У Ет. (10) 
Another way of writing the right member of (10), which is more convenient 
for our purposes, is 


8-25, У; Етан, : (11) 
j=1 omit i 

The right member of (11) equals the right member of (10) by virtue of the 
fact that, from (9), Үш»; = Yers » OF these covariances are symmetric 

in g and j. 
If part scores g and j are experimentally independent (that is, statistically 
independent over trials) for the ИВ person, then tam = 0. If these two 
part scores are independent for all persons in the population, then ЕЁ; = 0. 


" 
verse need not hold, of course. Since nonzero covariances can be 


The con 
having positive 


either positive or negative, we can have үну = 0 by 


covariances for some people and negative ones for others. 
Similarly, if all items are statistically independent for all people, then 
5 must vanish. However, we can have ô = 0 even though not all items are 
statistically independent for all people, for again positive and negative 
covariances within and/or between pairs of items can cancel each other. 
The role 6 plays in lower bounds to p: is illustrated by the third universal 


lower bound, А , developed in (3), 


n ( +4) (19) 
жек MN MEI 5 


The variances in the right member of (12) are defined as follows. The notation 
for the expected (“true”) individual total scores on the test is 


T; = Е. (13) 
k 
The respective over-all means for part and total scores are 
g = Е Ха, т=ЕТ: . (14) 
Then, 
o, = EE Gin — EP, of = EE (ts — т). (15) 
i ik i k 
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Each of the four types of parameters defined in (14) and (15) is observ- 
able on a single trial. For any fixed value of k, consider only expectations 
over 2: 


Ez, E ta, E (Eiir — E Zi), E (t4 — E tix)”. (16) 


other, etc.). Thus, the probability is unity that in any given trial k, the 
four quantities in (16) are respectively equal to 


with sampling error and assume for convenience tha 


t the operators E and 
E are always over an infinite population or universe, T 
k 


III. Some Basic Identities 
We need some furti 


her notation to study what happens when items are 
not attempted. Let 


_ = (1 if person i attempts item j on trial k 
Рик 0 otherwise, (18) 


Furthermore, let 


quantities in (17), т; is observable f 
E ри», or the Proportion attempti 


that рь = 1) and those in which he does 
Disk = Tuy = 0). 
Given notation (18), the following simple and important identity holds: 


Pints), = Tiik e (20) 
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if р; = 0, the ise is zero by our convention that unattempted items 

e scored zero. And if рь = 1, then (20) holds by direct multiplication. 

К Further notation needed refers to certain expected values of the тук 
ТШ. Let X{? be the expected value of £: over that subset of trials in 

which person $ attempts item j (or for which p; = 1), 


xX? ex E piitin/Psi . | (21) 


Stating (21) for the case where g = j, using (20), and remembering (8) yield 
ХР = Хи/Ри . (22) 


Finally, we need a basic identity relating two covariances. Let Үзе: 
denote the covariance between errors of unreliability, analogous to (9), 
for person i on items j and g, but only over the sub-universe of trials ds 
which person $ attempts item j (or where р = Th 
E р: 
Ee NE DO (23) 


G) 
Yziizio T ii 
i 


Using (20) and (22) in (23) and then multiplying through by Pi; show that 


Рид үн» = Е ик — БЕРДЫ (24) 
k 
Then, from (9) and (24), 
daga = Putin + хаос? — Ж), (25) 


Identity (25) is our basic tool for examining the dependence among experi- 
mental errors due to noncompletion of tests. It breaks the over-all covariance 
between errors, Yziizio › into two component parts, as expressed by the right 


member. 
nd Its Consequences 


identities or tautologies which are 
universally true, given à non-negative scoring scheme. The basic assumption 
from now on is that, if person i attempts item 7, then his score on any later 
item g (g > j) will be experimentally independent of his score on this attempted 
item j. That is, We are considering here the case where dependence is due 
ted, no further experimental 

of pure speed tests as 


solely to omissions, 50 that if a part is attemp 
dependence holds. This may be true, for example, 
well as many other tests, including some pure power tests where omissions 
may be scattered and not consecutive. The experimental dependence between 
items discussed in (1) js in particular true of pure speed tests: if a person 
h later items on the 
ttempts or omissions. We 


Iv. A Basic Assumption @ 


Up until now, we have derived only i 


does not reach a certain item, he certainly will not reac 
same trial; thus the dependence is due to попа 
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shall assume that attempted items do not lead to experimental dependence 
but only to dependence among true or expected scores. In particular, we 
assume error covariances of the following type to vanish: 


Yun. =й (>. (26) 
If hypothesis (26) is true, then (25) reduces to 


Үн, = ХХ — X) (у>ў. (27) 
According to (27), the total dependence over trials between item scores 
is a function of the three expected values in the right member. Should X? 
equal X;, , or the fact that the item j is attempted does not change the 
expected value on the (later) item g, then we would have Үш, = 0 according 
to (27), or we would be back to what is assumed (implicitly or explicitly) 
in all previous standard reliability formulas. But if X i) Æ X , then experi- 
mental dependence must hold between items j and g for person 7. 

We now wish to establish a useful upper bound to E Yeiszs, . From (21), 
since р; < 1 and all quantities involved are non-negative, we see that 


XD & ХР 


< А (28) 
Using (28) in (27), and remembering (22), 
Уи, 5 ХХ Р.) (>). en 
From (7) and (21), 
X? < m,. (30) 
Setting g = J in (30) and using the result in (29) yield 
Tama S XS Р.) (>). @1) 
Taking expected values of both members of (31) over 7 yields 
Duos S MEX Pe) (>), 2) 
Now, from Schwarz’ inequality, 
т Жей — Ри) < VOR TUE (33) 
Also, from (7) (written with 9 in place of 1), | 
Жї sq. , (34) 


and from the fact that 0 < Pis S1, 


A= Р 1= Bg (35) 
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Using (35) and (34) in (33), and then the result in (32) [remembering (14)] 
produces the desired inequality 


Е quse £ Mi Ут = Wi) (g » 3. (36) 
In (36), both т; and £, , are observable from a single trial. í 
The upper bound to ô that we are seeking is obtained from (86) by 


summing both members over g and j (except for g = j) according to (11), 


5522, (mvi == li. vn). (37) 
of dichotomous items, each of which is scored 


In a test composed only 
a test, (37) reduces to 


zero or unity, we have m; = 1. For such 


5525 (Ут D ~) (т; € 1). (38) 
1-1 9-1 +1 
Inequality (38) holds for m; < 1 аз well as for m; = 1, and we have so stated 


it; this is the formula given in 3, p. 64. What we have defined above as 


D" in (4) is the right member of (38) as computed from а single trial. 


Reviewing the proof shows that (37) holds for a much less restrictive 
hypothesis than (26). We can assume the inequality 


2.30 (9>7 (39) 


and again arrive at (37). Under what 


in place of only the equality of (26), 
tifiably hy- 


circumstances an actually negative error covariance can be jus 
pothesized remains а problem to be explored. 

V. Another Upper Bound for à 

it is possible to arrive at other useful 


Using assumption (26), or (39), 
ї (36) and (37). For example, 


inequalities for E ©; and for 6 in place o 


the following inequality will be established: 


E Yziizio = ут, V mQ — Ti) (g > j)- (40) 


(40), use (30) in (27) to obtain 
Yriizio = X un, ipa Xio) (9 > 2. (41) 
1) are always non-negative, their 


bers of (41) and (3 
aller than the smaller of the two, 80 we can 


For the proof of 


Since the right mem 
geometric mean is never sm 
write from (41) and (31) that 
Yeux = Vm;X Q = P,)X «(m, — X) (g > 2: (42) 
Notice that the left member of (42) may be negative, or that (42) does 
not refer to the absolute value of «5: - Now, the quantity X4, = Xi) 
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when regarded as a function of X;, , reaches a maximum when X. te = m,/2, 
so we always have 


Xi (m, — Xi) € m;/4. (43) 
Using (43) in (42) yields 
Үзгин, Sm, V mX4Q0 Ра) — (gj). (44) 
From Schwarz’ inequality, and then notation (14) and (19), 
E VX.1 -Р.) < VE — =). (45) 


Taking expectations over i of both members of (44) and then using (45) 
yield the desired inequality (40). 


Summing both members of (40) over g and 7, 
another upper bound to 6, 


5 È (VaT È m). (40) 


n 
i=l о=і+1 


remembering (11), yields 


For the special case of a test com: 


posed of dichotomous items scored 0 or 1, 
or where m; = 1, we have 


g-j*l 


Т == n, = TD, (47) 


so that for this special case (46) reduces to 


55 È 6-3VEI-x) (mm. (48) 


i 


What we have defined аз D’ in (1) above is the right member of (48) as 
computed from a single trial. 
VI. A Third and Better Upper Bound for 8; Further Possibilities 


It is helpful in discussing the bounds to consider first the special case 


of scoring where m; = 1. For this case, (36) becomes 

Prae 5 У =7) (mel), um 
while (40) becomes 

Eve 3 ЗУ а) (жылу. e 


Which of these provides a better bound to E y.,,.,, ? That is, which has 
the smaller right member? i 


In one respect, inequality (50) is better than (49): it has the factor 1/2- 
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Clearly, (49) will be better than (50) if and only if, < &/4 (9 > 2. Therefore, 
if we define є, by 


NEL LEM EEE ще 
av с = т) if & = §;/4 


then we have an improved bound for 6: 


ао», ( 25 "i (52) 
jal \д=?+1 
Inequality (52) is sharper than either (38) or (48). More generally, if m; #1, 


we can define ej, to be larger of the two right members of (36) and (40) 


and again write (52). 
It is interesting that the matrix of the average error covariances, that is, 
of the E Үш, is bounded in both (40) and (36) by à simplex matrix as 


defined in (4). A simplex matrix is a symmetric matrix whose elements 
tea; = m; l-r; 


are products of the form a;b, (g > J). In (36), we can wri 
and b, = Vm, , while in (40) we can write a; = V mel = v) and 
b, = im, . 

There are important special kinds of tests which necessarily have internal 
simplex features and not only & simplex type of upper bound matrix for 
error covariances. Three such are: (a) а pure speed test (where everything 
attempted is done correctly); (b) & test composed of a single question like 
“Write down all the words you can that begin with the letter Ф»; (с) а 
power test in which, if a person decides not to try the jth item, he will try 
no more items. In each such case, it follows from notation (18) that 

рф = Piok (0> 2. (53) 


Condition (53) states that if person $ does not attempt item J, then he does 
not attempt any items beyond j. Or if he cannot produce j words beginning 
with the letter ¢, he cannot produce 9 words where Q3. 

Multiplying (53) through by ox and using (20) 


Dijk Dig = Piok (g >). (54) 
Using (54) in (21) yields 

Xie Х/Ра (g > 3 (55) 
showing that the inequality in (28) cannot be pA bw Ra а 

1 ег - 

actually attained in our present special case 0 d | 
equales above correspondingly become equalities. Furthermore, it deed 
feasible to obtain better bounds than those based on л by pivoting instea 


on А% of (3). 
о the general case where (53 


To return t ) does not hold, further formulas 
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are also possible of the split-half type, based on №% of (3). The new ô to be 
bounded consists of the covariance between two swns of errors, one sum from 
each half of the split. This covariance is always a sum of item error covariances, 
and can be bounded immediately by using our formulas for ¢;, above. The 
sj, Should be summed the way the split calls for, and the result can be used 
to bound ô in the formula for \* . Using split-halves, while it requires only 
two sub-variances to be computed, does not avoid the problem of taking 
into account the possible experimental dependence between the halves, 
and this can be studied rigorously only itemwise, as through Ge = 


———— 


Erratum in Guttman, Louis. Reliability formulas that do not assume 
experimental independence. Psychometrika, 1953, 18, 225-239. 
On page 231 in formula (21) and in 


each of the preceding two lines, 
X; should replace z; throughout. 
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A MATHEMATICAL MODEL FOR CONDITIONING* 


С. W. BOGUSLAVSKY 
CORNELL UNIVERSITY 


It is postulated that occurrence of a conditioned response depends on 
recurrence of one of a finite number of specific vigilance reactions. Number 
of trial on which a conditioned response occurs is shown to be a sufficient 
statistic for estimating the number of such vigilance reactions. The hypothesis 
is tested by noting whether numbers of trials on which conditioned responses 
occur fall within confidence intervals determined on the basis of a selected 
sufficient statistic. Applications of the model to psychological research are 


suggested. 
I. Introduction and Postulates 


Systematic treatment of behavior has generally followed the pattern of 
functional relation between stimuli and responses, with intervening processes 
inferred from these two variables and viewed as theoretical constructs. 
There is reason to believe, however, that in some instances such processes 
may be treated as independent events. The specific reference is to behavior 
patterns which Pavlov grouped under the term “orienting reflex” (13, p. 134). 
Though Pavlov insisted that these disappear with the progress of conditioning 
(14, p. 94), Guthrie has presented a convincing argument to the contrary 
(5, p. 74), and one of Pavlov’s own statements (12, p- 385) may be interpreted 


as refuting the original thesis. | 

A series of observations at the Cornell Behavior Farm has led the 
author to conclude that occurrence of orientation to conditioned stimuli 
is the rule rather than the exception. The more conspicuous features of this 
phenomenon are: circumscribed variability of pattern, synergy of action, 
facilitating effect of the general static reaction on the ensuing activity, 
and autonomic concomitants manifested by changes in respiration and 
cardiac output. All of these observations have either direct or inferential 
support in scientific literature (4, р. 3; 16, РР. 129, 305, 3+2; 2, р. 505; 19, р. 


13; 7, p. 668; 10, p. 139). | 0l | 
It is a ee that inclusion of orienting behavior 1n a theoretical 


model would improve accuracy of prediction. However, since precise desig- 
nation of the reactions and of the conditions governing their emergence 18 


impractical, the use of monotonic functions to express relations between 
7 
Proms doctors dissertation at CO S. Liddell, under whose direction this researc: 


the invaluable advice an е » 
7 i Н to Dr. Jack Kiefer o 
Орды specia, debt of grotitude іа A og materially in the development of the 


ment of mathematics, whose skill and 
mathematical portions of this paper. 
125 
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variables must be abandoned. Accordingly, the present model has been 
designed on the basis of a system of operations which do not depend on the 
full knowledge of antecedent conditions. 

In view of the limited connotation of the term “orienting,” the author 
will follow Liddell’s precedent (9, p. 160) in referring to the animal’s immediate 
responses to the conditioned stimulus as specific vigilance reactions, or, in 
abbreviated form, as SVR’s. 

The following postulates state formally the author’s theoretical position: 

1. In a given situation a supraliminal sensory stimulus evokes in an 


organism one of a bounded set of N discrete and mutually exclusive specific 
vigilance reactions. 


2. Sensory stimulation immediatel 
each specific vigilance reaction becomes 
with which it is contiguous. 

The first proposition implies that, wit! 
the № members of the set of SVR’ 
ultimate outcome determined by 
stimulus. As long as the respectiv 
are unknown, the author must assu: 


y consequent upon the performance of 
a conditioned stimulus for any response 


h the occurrence of each stimulus, 
в compete one against another, with the 
unspecified factors extraneous to the 
e probabilities of the several outcomes 
me, in the present development, that the 
Proposition does not, however, preclude 
у evaluated probabilities of identifiable 
roblem of development along these lines 


es will now be examined with the 
rving as the model. 


AL. Probability Distribution of n, 


Stimuli are presented one at a time, independently of each other, the 
probability of a given stimulus evoking each of the SVR’s being 1/N. 
An instance of recurrence ig defined as evocation of an SVR which had 


pe evoked On one or more preceding trials. The statement “Jc instances of 
recurrence” refers to the number of trials 


characterized by such recurrences. 
Tt does not imply that the same SVR has occurred k times; the number 0 
SVR’s involved in k instances of recurr у 
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ИБ Тһе variable n, is defined to be the number of the trial on which the 
ү шт of ү takes place. By deduction from the postulates it 
so the number of the trial on which th iti 
ааа ch the conditioned response appears 
$ Аз an illustration of the preceding definitions, consider the four suits 
of a deck of cards as representing four different SVR’s. In drawing with 
replacements the sequence H, 5, H, D, S, S, C, one obtains three instances 
of recurrence: that of H on the third drawing, and that of S on the fifth 
and sixth drawings. Thus, n; = 3, % = 5, and пз = 6. Since all suits appeared 


during the sequence, all subsequent drawings will be instances of recurrence. 
of recurrence, the total number 


Because т, trials include k instances 

of different SVR's evoked is n, — k. Also, since the mth trial is one during 
which an instance of recurrence takes place, the total number of different 
SVR’s evoked immediately prior to that trial is also п, — К. Thus, at the 
end of the (n, — 1)th trial as well as at the end of the mth trial, the animal has 
in its repertory N — n: + т different unevoked SVR’s. Accordingly, the prob- 


ability distribution of n, may be written as 
Ру{т = j} = (probability that after j — 1 trials the animal’s repertory 
contains N = j + k different unevoked SVR’s) X 
(probability that on jth trial one of the previously 
evoked j — k SVR’s is elicited again). 


Evaluations of these two probabilities have been derived by Feller 
bols, the substitution yields 


(3, pp. 69, 313). Allowing for differences in sym! 
е N f o»( ( 
Pulm = = (sae ED 


»-0 


ады уы) 


= ( _ a a z c»( 3 ale ike y". А 


п; another readily verifiable function 


distribution of 
77) is directly applicable. 

hat by the end of jth trial there have been 
ence of SVR’s 

f jth trial there are at 


s not yet evoked 


For the cumulative 
derived by Feller (3, Р. 
Py{n $ 3) = probability t 

k or more instances of recurr 

= probability that by the end о 

least N — j + k different SVR’ 


| k БИ 
2 " ig + ) 2 c» » у | (2) 
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Ш. Sufficiency of the Statistic n, 


For a discrete case, t, = t(z, , £3, - -- , ®„) is said to be a sufficient statistic 
for the parameter N if, whenever Dx(L) > 0, the conditional probability 
function py(x, , --- , x, | t,) does not depend оп N. 


A necessary and sufficient condition that ¢, be sufficient is that the 
joint probability function of z;’s can be written 


ps ‚++ а) = 90, , М), +++ , 2), (3) 


where 9 > 0, > 0, д depends оп x,’s only through the function ¢, , and h 
depends on z,’s in any way, but does not depend on N. 

It will now be shown that the joint probability distribution of the 
variables n; , 1 S i < k, can be written in the form (3), where n, takes the 
place of t, , and that, therefore, n, is a sufficient statistic for the parameter М. 


Let X, + 1 be the number of trials between (i — 1)th and ith instances 


of recurrence of SVR's (not counting the former, but counting the latter). 
Fori = 1 


Py{X, = 2} = probability that the first x, stimuli evoke х, different 


SVR’s, and the (x, + 1)th stimulus evokes an SVR 
which had occurred earlier, 


= Я -@ М-Л a 
N N N N 
epo (4) 
ММ — a)! 


The conditional 


probability distributions for the variables X; , i > 1, 
may be derived in an 


analogous manner. For i = 2 


= 


ра |а) -LaNa Ма (рафа 
N N N N 
W = =) Ка, + x) 5 
= 27, 5) 
NN — ж, = Ta)! à; 
Designating for convenience 
8, = P 1, (6) 


i=1 


the conditional probability distribution for the general case is 
7—1 Б 

ааа = (ана) 

à ) II N N 


~ N — SS (7) 
N**"(N — S)! 
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obtained by multiplying the probability distribution of the initial variable X, 
by the product of the conditional probability distributions of the remaining 


variables X; ,7 > 1, 
k 
С x) = ps(c)- П Dx: en ge 0 i-1) 


NUU(N — a)! i8 NW — 82! 


N! Е 
Si (8) 


pu ee 
МА — Si)! П 


FN ЕС (9) 
МУ = п FR! 81 | à 
where n, = S, + v, the relationship being derived from definitions presented 
earlier. 
Inspection of (9) shows that the function represented by the first factor 
"TES except through т» ; the function 


does not depend on the variables т: 
represented by the remaining product does not depend on N. Thus the 


criterion of (3) is satisfied, showing that m. is a sufficient statistic. This 
implies that estimation of the parameter N may be made solely from the 
knowledge of n: ; and that knowledge of the values of n; , $ < k, provides 


no additional information. 
IV. Tests of H ypotheses 


The foregoing discussion indicates that description of the progress of 
conditioning is reducible to the single parameter N. Although many specific 
vigilance reactions may be identified with accuracy from their unique postural 


components, precise evaluation of N by direct observation is hardly feasible 
at this stage. Accordingly, 


the experimenter must resort to estimation of N 
from information gathered during the process of conditioning. One approach 
is to proceed with presentation of stimuli until the kth occurrence of the 
conditioned response, k being a previously selected constant. With an observed 
value j of n: 5 either point estimation or interval estimation of the parameter 
N may be made. One possible procedure for point estimation, the so-called 
hood procedure, involves the use of (1), where N is chosen 
= j} is maximized. Procedure for interval estima- 
fidence intervals. 

f N consists of extending the 


t which the animal performs à conditioned response 
lates it follows that those trials on 


trials characterized by novel 


The joint probability distribution of the variables X; , 1 = i S k, is 
Sisk, 


maximum-likeli 
so that the value of Py (Me 
tion will be described in the section on con 

Another approach to the evaluation о 
experiment to the stage а 
unfailingly on every trial. From the postu 
which no conditioned responses occur аге 
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SVR’s. Hence, the total number of trials on which no conditioned responses 
occurred is directly equivalent to N. The practical problem in this approach 
is clearly that of defining the criterion of conditioning. The following discus- 
sion deals with tests of hypotheses concerning the parameter N. While the 
subject is one of intrinsic interest, the discussion is introduced at this point 
as a preliminary step in the development of a procedure for interval estima- 
tion. 

Let № < М, be two specified positive integers, and designate by Ho 
the hypothesis that N = N, and by H, the alternative hypothesis that 
N = N, . A test of Ho: М = М, against H, : N = N, involves a single alter- 
native and a single observation. Accordingly, construction of such a test 
consists of selecting a critical region such that 


Py, {т = j } 
Py. {ть = Hi 
where c is chosen so that the probability of the critical region under H, is 0. 


Substitution from (1) into (10), with the factors not involving N can- 
celing out, yields 


>c, (10) 


NEN! 0 = 7+ bi 20. n 
It will be noted that 


L(j + 1) LNN-—jclk 
Ly ON NFEE? b ш 
the last inequalit; 


y being a consequence of the fact that N, > N, andj > k. 
Hence L(j) is a 


у most powerful test, for the specified probability 

:N = № against H, : = М, , since 
the constant depends only оп N, . It is also a uniformly most powerful test of 
the hypothesis H, : N < № against H, : N > N o , Since the probability of 


Type I error for апу N < No is less than that for N, under this test. 
equivalent to the rule to reject 


The criterion stated in (10) is thus 
Hy: М = № if n, takes on a value j such that 


j2b, (13) 
where b is chosen so that 


Py.{m = b} = ө. (14) 
Similarly, a uniformly most powerful test of the hypothesis Ho : N = No 


(or N = №) against H, : N < N, is given by the rule to reject Ho : N 
if n, takes on a value j such that 
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1<а, (15) 


where a is chosen so that 
Py, {Me < a} = 6. (16) 


For testing the hypothesis Ho : N = М, against H, : N # № ‚ а uniformly 
most powerful test of size 0 does not exist. An approximation to the best 
unbiased test is to choose two numbers а and b such that 


Ри, {ть € а} = Py, {ù 2 bj =2, (17) 


and to reject Ho : N = No by the criteria stated in (13) and (15). 


V. Confidence Intervals 

The best one-sided confidence limits on N can be easily constructed. 

Thus, the rule of (13) and (14) is equivalent to the rule of accepting 
H, : N = No if n, takes on à value j such that 

jsb-lh (18) 


where b is chosen so that 


Py (m 5 5— 1]21- 6. (19) 


With 6 selected arbitrarily, values of b are calculated for different values 
№, by means of (2), where j is the symbol for b — 1. For each № let b(No) 
designate the value thus calculated. Fach value b(No) — 1 is now plotted 
as the ordinate above the corresp of N on the horizontal 


axis, Then, for each value No of N, 


Pydna € (No) — 1} =1- 6. 


e of (№) — 1, thus designating М as a function 
the maximal value corresponding to № , given 


onding value No 


(20) 


Let L(n,) be the jnvers 


of n, . Since b(No) — 1 is ) о 
the condition (19), clearly, for a specified n: , Іт) is the minimal value; 


i.e., for the selected 6, the value b(No) — 1 of n, may be obtained only with 
those values of № which are equal to or exceed L(n;). Thus, (20) is equivalent 
to 

Py {№ 2 Ln) 21-9 (21) 


The last equation implies that, whatever the true value No of N, the 


probability is 1 — 6 that the chance variable m, will come up so that 
№ = Г). In other words, L(n:) is a one-sided confidence limit on N of 


confidence coefficient 1 — "- 
ue No of N a value a(No); 


Similarly, for each val corresponding to the 
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a of inequality (15), is calculated. The values а(№,) + 1, plotted as before, 
yield the relation 


Ру. {ть > a(No.) + 1} =1- 6, (22) 
and Из inverse form 


Р, = U(n)) = 1 — 9, (23) 


the last equation implying that n, will come up so that the probability of 
U(n,) being equal to or greater than the true value of N is always 1 — 9, 
whatever be that true value of N. 

The methods for constructing one-sided confidence limits are best 
because the corresponding tests of hypotheses from which they are derived 
are uniformly most powerful. Since there is no uniformly most powerful 
test of Но : N = М, against H, : N >= N, , the two-sided confidence interval 
may be constructed by a procedure approximating an unbiased test, or one 
with the shortest, acceptance region. For each value No of М a value a(No) 


and a value b(N,), corresponding to a and b of (17), are calculated. These 
are such that, whatever the true No , 


Руа) +1 Sm S WM) — 1] 2 1— 6, 24) 
and, inversely, 


Px, (Lou) 


IIA 
IA 


№ = U™)} =1- в. (25) 


Since n, is a discrete chance variable, it is not always possible to find 
values of а and b which yield exactly 0 for each № . A conservative procedure 


of designating the limiting values L(n,) апа U(n,) of (25) would be to state 


the integral values of N which lie nearest the limits, outside the interval 
defined by n, and 1 — 9. 


Figure 1 gives values of ju and j, for 1 S k < 5, such that 


Руль S jo} = Ру = ja} = .90, ee 

with Јо and J. chosen so as to make this probability as little greater than 90 

as possible. The upper limits are designated U and the lower L. The number 

cia О or L refers to the value of k for which the limit was computed. 
us, 


Ра {ть € ju} = .90 (27) 


is interpreted as “the probability is .90 that, given N = 200, the second 
recurrence of an SVR will take place no later than jyth trial.” 


Locating 200 on the horizontal axis, one proceeds vertically to the curve 
2U, and thence horizontally to the vertical axis which is met at n, = 3% 
the latter is the value which satisfies the condition (27). Similarly, the lower 
limit of n, may be evaluated, showing that | 


б. W. BOGUSLAVSKY 133 


FIGURE 1 
Confidence Limits on N 


Pools 2 17} = 90. (28) 
(28), one obtains an approximation to be used in 
biased confidence interval on N, defined by the 


fidence coefficient .80. Thus, 


Combining (27) and 
constructing the best un 
parameter 200 and the соп 


Proo{17 £ т 5 39} = 80. (29) 


The same procedure is followed in locating confidence intervals for other 
values of N and k within the limits of the chart. 

In a typical conditioning situation N is unknown, and the problem is 
one of estimating this parameter from the variates n: - The chart shown in 
Figure 1 fulfills this function if used inversely. Assuming, for example, that 
the fifth conditioned response occurs оп trial 20, а horizontal line is drawn 
from the vertical axis at 20, and ordinates are dropped from its intersections 


with 5U and 5L. These meet the horizontal axis at 25 and 60. The estimated 
the fifth conditioned response 


interval on М is now $ 

occurred on trial 20, it may be stated with 80% confidence that the interval 
25 to 60 includes the true У f specific vigilance 
reactions possessed by the organism in the gi 
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VI. A Test of the Model 


The model may be tested by observing whether, for a given N, the 
joint distribution of т?з is within the region of acceptance selected in such 
a manner that its total probability is 1 — 8. While this method of testing is 
highly desirable, it is not feasible at the present stage of the development of 
the model, inasmuch as it depends on the knowledge of the true value of М. 
It has been possible, however, to construct a test which does not depend on 
the knowledge of this parameter. The procedure for such a test is described 
below. | 

In Section III it was shown that, if the model is true, the statistic n, 19 
sufficient, and that, consequently, the conditional probability distribution 


Prin = ji m = ja ر‎ ۰۰۰ ma = јл | m = je} 


does not depend on N. 
This distribution is given by 


Dx(jı A — Jr) = Цв, (80) 
px(39) У П 8, 
S, vel 


where the expression on the right is derived from (8), 


and the summation in 
the denominator is taken over all possible sets of valu 


es of S, such that 
TEN SSS, Sees аль (31) 
An appropriate summation 


ү of the numerator in (30) yields the condi- 
tional probability distribution of a single variable n, . Thus, 


s(xHszi s) 


S» vet 


PG: |j) = pr eu (32) 


where sums in the numerato. 


r are taken over all possible sets of values of 
5, such that, in the summati 


on of the first product, 
PSG — (88) 
and, in the summation of the second product, 
SSS845--sS5,£5j-—k eu 
Finally, a summation of (32 
in the numer: 
of n; , 


г ) over the indicated values of the variable 
ator yields the cumulative conditional probability distribution 
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E [sl] a 


with conditions for the other sums defined by (31), (33), and (34. 
Equation (35) provides a direct test of the model. Thus, if the model is 
true, the observed value z of n; will be such that, with probability 1 — 6, 
0 


9 " 

6 ер Salm =] 81-2 (86) 

Selecting k = 5, and designating the critical area @ = .20, test observations 

were made on four goats in & conditioning situation. An auditory signal 

served as the neutral stimulus, and flexion of the right foreleg, unconditionally 

p by an electric shock, as the response. The results are presented in 
able 1. 


TABLE 1 


0: Numbers of Trials on Which CR's Occurred 
апа Their Cunulative Conditional Probabilities 


F 


Capital letters in the top row of Table 1 are code letters of the four 
animals. Beneath each code letter is the number of the trial on which that 
animal gave the fifth CR. The column labeled i contains ordinal numbers of 
the first four CR’s. Columns labeled j; give the numbers of trials on which ith 
labeled P give the cumulative conditional prob- 

Е (35). Thus, by way of illustration, goat УА 
0, and its third CR on trial 16. The probability 


that an animal which gave its fifth CR on trial 20 should have given its 
third CR no later than trial within the arbitrarily 
selected acceptance in : Ат 
Inspection of Table 1 shows that only two of the sixteen probabilities 
fail to meet the criterion of acceptance. These are probabilities comput 
for the fourth CR of H and the first CR of P. The extreme value of H may, 
however, be explained by the fact that the conditional probability of the 
fourth CR having occurred exactly on trial 15, given that the fifth CR took 
place on trial 16, is 49. Of course, the four tests corresponding to different 
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rows in any column of Table 1 are not independent of each other; however, 
as a rough indication of the validity of the model, Table 1'ргеѕепіѕ a convinc- 
ing demonstration. 

At this point it may be noted that selecting regions of acceptance for 
each n, separately rather than designing a single region for their joint distri- 
bution actually increases the size of the test; i.e., the total probability that 
at least one of the rows in any given column would give a result leading to 
rejection of the model when it is true is greater than 0. The effect is illustrated 
in Figure 2, where the test is applied to the case of n, and n, , given n, , k > 2. 


A N B 


Ficure 2 
Schematic Representation of Regions of Acceptance 


Since n, is greater than n, , the joint. probabilities are greater than zero 
above the diagonal, and zero elsewhere. R represents the region of acceptance 
in which the sum of joint probabilities is 1 — 9. Construction of R involves 
laborious mathematical computations which would be hardly justifiable in 
the absence of specified alternatives to the model. Since a test of the model 
at this stage is intended merely as a detecting device of obvious fallacies, 
if any, the substitute procedure, defined by (35), consists of selecting separate 
regions of acceptance Гог n, and n, , such that in each case the sum of mar- 
ginal probabilities excluded on each side of the region is 9/2. This construction 
of the regions of acceptance yields probability intervals which are approxi- 
mately the shortest possible, since the ordinates which form the limits of this 
region are approximately equal. Such a test should give good power against 
most reasonable alternatives to the model. 

In Figure 2 the area bounded by the vertical lines A and B is the region 
of acceptance for n; , and the sum of joint probabilities within this area is 


D 


ج س لر 
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1 — 6. Similarly, the area bounded by the horizontal lines C and D is the 
region of acceptance for т» , and the sum of joint probabilities within this 
area is also 1 — 0. With this construction the condition for non-rejection is 
that both n, and m, occur within the rectangle bounded by the four lines. 
It is clear, however, that the sum of joint probabilities within this rectangle 
is less than 1 — 0. Hence, the size of the test is actually greater than 6; or, 
with a diminished probability of Type II error, the power of the test is some- 
what higher than that of the precise test based on the region of acceptance Ё. 


VII. Conclusions 


conditioning has been presented primarily 
for the treatment of intervening variables, 


rather than as a substitute for the existing systems. Since the model does not 
demand rigor in the definition of these variables, it has possibilities of adapta- 
tion to theories of behavior which regard autonomous central processes as 
crucial, The author is currently engaged in one such adaptation, extending 
the model to include problems in discrimination learning. The extension 
should furnish a method for treating Krechevsky’s “hypotheses” (8) as 
stochastic variables, thus providing а testable alternative to Spence’s model 


(18). 
In another application, 
for the study of individual 


reach a stipulated criterion WI eoe 
Since N is the sole parameter involved, the phenomenon of stow earning 


may be interpreted as inability on the part of the organism to гез its 
range of vigilance to the situation at hand. Since, however, the one ыыы 
of no way to test the latter inference, it must remain, at least for t т men 
on the level of intuitive generalization. On the other hand, auma es Ps E 
based on a limited number of trials, are intended to provide beers m ш B а- 
tive indices for a variety of psychological investigations, ranging ws am a m 
i ity and environment. 7 
of stratified samples to problems of heredity he 
for the same аа the parameter may furnish a comparative estimate 
i rni ituations. 
of the efficacy of various learning situa Е | 
Studies Р relin large populations are often prohibitiv e c: wel P 
and effort required to train each subject to à criterion of manta ка an 
training each subject to a predetermined number of conditione p >= 
опе 15 able to make а reasonably accurate quantitative ee о з 
] ibili 'aini i bstantial economy 1n abor. - 
ese nci ms in niis. а for the construction of gradients of 
more, the potentialities of the par sie ашк F 


imilari tually to а reexami ет 
н ба VE even sonditioning in the light of mediating factors 


generalization and pseud 
susceptible to systematic trea 
The model may be rega 


The mathematical model for 
as an illustration of a technique 


in the value of N, a measure 
requiring many trials to 


ll, on the average, yield larger estimates of N. 


the model provides, 
differences. Organisms 


tment. 


rded from one of two contrasting theoretical 
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positions. One may either take the stand advocated by Skinner (17) ал 
view the parameter N purely as “a formal representation of the data reduce: 

to а minimal number of terms,” or one may follow the course suggested by 
Pratt (15) and postulate independent existence of neurophysiological — 
corresponding to this parameter. The author leans towards the latter point o 
view because of its potentiality as a source of future hypotheses. Moreover, 
improved techniques of observing and recording specific vigilance reactions 
may ultimately lead to an independent estimate of the parameter N , thus 
serving as the second of “at least two methods” stipulated by Bridgman (1) 
“of getting to the terminus.” 
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TWO MODELS OF GROUP BEHAVIOR IN THE SOLUTION OF 
EUREKA-TYPE PROBLEMS* 


IRVING LonGE 
AND 
HERBERT SOLOMON 
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A study by Shaw (7) some twenty У 


social scientists to support the ge 
individuals in problem-solving. Shaw suggests 


within the group is responsible for the superior per à 
article re-examines her data in the light of two models which propose that 


the difference in quality of solution between group а п 
is solely a matter of ability. It is shown that Shaw’s data may be considered 


to have been an outcome of behavior postulated by the models. Since Shaw’s 
observations relate to а special рона and to special kinds of problems, 
the proposed models may not be appropriate 


conditions. In fact, Lorge et al. (4) have indicatec і ] с 
stration of the superiority dividuals in problem-solving 


depends not only on 
solved. In addition, 
individuals is considered. 


Introduction 

ly the data from the first half of the Shaw 
given. Three problems (3), 
the transport of objects 
and to individuals. The 
the transport of three 


Since this article treats on r 
experiments, a brief description of this part will be 
each a well-known mathematical puzzle involving 
under certain constraints, were given to groups 
first, known historically as the Tartaglia, requires 


jealous husbands and their three beautiful wives across à river in a boat 
holding just three at à time, 


under the constraint that no husband will allow 
his wife in the presence of another man и 


nless he is also present, and with 
the specification that only husbands can row. The second problem, the 
historical Alcuin, is similar in that it requires the transport of three mission- 
aries and three cannibals in а boat carrying two ata time under the constraint 
that missionaries May never be outnumbered by cannibals, and Na ae 
specification that all missionaries and just one cannibal have mastere the 
art of rowing. The third problem, the historical Tower 


of Hanoi, or disc 
problem, is similar to the previous two in that it requires the transport of 
three graduated discs, stacked in order of size, to ano 


ther position via an 
intermediate way station, under the constraint that: a larger disc may never 
the Office of Naval Research un 


* аз t b: der Contract N6 onr 266 (21) 
and the AE Gores i ane and Training Research Center under Contract AF 18(600)-341. 
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be placed on a smaller one, with the specification that only one dise may be 
moved at a time. 

Shaw's subjects were students in a social psychology class which had 
been divided into halves: one half being formed at random into ad hoc like-sex, 
four-member groups, and the other half serving as individuals, i.e., as controls. 
Thus, the performances of five groups were contrasted with those of twenty- 
one individuals. Each group and each individual was asked to solve all three 
problems in the same sequence. 

A criterion for comparing group and individual performance is the 
contrast between the proportion of individuals and the proportion of groups 
successful in the solution of each problem. For Shaw's three problems, the 
proportions of individuals and groups mastering each solution are given 
in Table 1 (Columns 1 and 3). When, for each problem separately, the differ- 
ence between proportions of success in groups and in individuals is tested, 
using an upper one-sided .05 critical region, the data for Problems I and II 
support the generalization of group superiority, but the difference between 
groups and individuals for Problem III is not statistically significant. The 
statistical test (2, 6) of the hypothesis that two proportions are equal is 


0¢ — 0r 


e, (0 
VN, * N, 


ga 


where 0 = 2 arcsin vp, p= 
and the subscripts J and G refer 
The function z is approximately 
unit variance under the hypothesi 
be used to support Shaw's conclusion (7 
much larger proportion of correct i 

Of the five groups, however, 
solve all problems. Of the twenty. 
one of the three problems. Т he fac 


c of the problems and two 
-one individuals, none solves more than 
t that some groups solved none and some 


problem may be composed of, and 
› of course, reduces to Model A for 


rae UN 
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Model A 


TP, ae Model A the probability of a group solution is the probability 
m e group contains one or more members who can solve the problem. 
is non-interactional ability model for any specific problem can be expressed 


mathematically as follows: Let 


Ре = the probability that а group of size k solve the problem; 
P, — the probability that an individual solve the problem. 


Then 
Py =1 = 0-Р)" (2) 
where Pg and Р, are population parameters considered fixed for the specific 


problem and the specific population. 
Confidence in the tenability of this non-interactional ability model 


can be decided by testing it on the basis of sample observations. Assume 
Ме observations of group performance and №, of individual performance. 
Then sample estimates ре and ру may be obtained, where ре and p; are the 
ratios of the observed successes to attempts for groups and for individuals, 
respectively; ре should be compared with Pea (or equivalently, pr with 
Pra), Where 


pe, =1 — (1 — р)“, (8) 


or equivalently 

ры = 1 — О = рд" (За) 
— рва) certainly сап be used as a test of 
bserved difference, the more tenable is the 
the less tenable it is. If an а 


1а be rejected if 


The observed difference (pe 
the model, for the smaller the о 
model and, the larger the observed difference, 
level of significance is used, then the model wou 


Pr {(Pe — Paa) > Oa} 5 « 
ise, where О, is the ob 


tive personal interaction (a 
lving the problem) is 


served difference. A. one-sided 
n unable majority preventing 
not anticipated in the Shaw 
ainst all alternatives 
tive interaction does 
the probability 
t holds for 


and accepted otherw 
test is used since nega 
an able minority from 80 

groups, and thus the test is made most powerful ag 
indicating positive personal interaction. That is, if posi 
exist, the probability of rejecting Model A is higher than 
given by a two-sided test of the same size. A similar argumen 


(Pra — pr), since it is an equivalent test. 
To test the existence of the model, the distribution of (ре — Poa) must 
be obtained. Although Pe and pe, are independently distributed proportions, 


the distribution of their difference is no longer related to the standard distri- 
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bution of the difference of two binomials since pc, is not a binomial ; Pea 18a 
function of p; , which is a binomial. This complicates obtaining the exact 
distribution of (рс — pc.) either in closed form or in a form such that existing 
tables may be used. Since sample sizes are small, however, it is not too 
tedious to compute the exact probabilities of all differences larger than the 
observed difference under the assumptions that (1) the model holds and 
(2) the nuisance parameter (either Ре or P;) is replaced by a sample estimate. 
It is interesting to note that 


" =P,)" , РИ] — РӘ = ИР 
alu oP ат ) ats i( 1) ( В] 


№ 
P,1 — Р)(1 — 6P, + 6P} 
x T ; 
and 
— Р P 
бе, = TH? e — py BED +... + m. 


where JP. = l; 2, ШЕ 
for large N ; , po, 
dE 


` , 6, are eighth-degree polynomials in Pa. Thus, 
is an unbiased estimate of Ре and its variance is 16(1 — Р) 


For the three Shaw problems, there are six 
twenty-two possible values of p; . In Problem I, for instance, the observed 


difference (рс — ро.) is 4, where po, is computed from formula (2) using 
the value of p; reported by Shaw. It is necessary, therefore, to tabulate all 
possible differences greater 


than the value .14. For these tabulated differences, 
the probability of each is computed under the specified assumptions. The 
probability for each difference is the product of the probabilities that the 
Pa and pg, involved in the difference do occur when the two assumptions 
hold. The probability that a Ред Occurs is equal to the probability that its 


corresponding p; occurs. The probability for pg and Po, may be obtained 
readily by reference to a binomial table (5). The sum of these products of 
probabilities is the exact probab: 


ility that an observed difference will exceed 
-14. In Table 1, column five give: 


| 5 the exact probability, P, that the observed 
difference (pg — Poa) Will be exceeded by chance. 


An approximation to the exact probability can be made when pr is 
small enough so that pe, can be approximated by kp, , for then 


i (2 arcsin Vkep;) 


possible values for ро and 


1 В 
апа i (2 arcsin Vp) 
‘are approximately normally distributed with variances 


1 
N, œd FAC , respectively. 
G 


— 


--—_ — e. 
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Thus, if Model A holds, 
m arcsin Ve — 2 arcsin V kpr 
(4 


2 = 
д. № 
хм; N: 


ep ee d normally distributed with zero mean and unit variance. 
eid 1 erties have been taken in this approximation by assuming kpr to 
omial since it can assume values greater than one. This assumption 
apparently, does not impair its usefulness for the Shaw experiments. In 
ives P! = Р,{г > zo}, Where Zo is the specific value for 


Table 1, column six gi 
z corresponding to the observed difference. Notice that the approximation 


obviously gets better as рт decreases. 

н The hypothesized non-interactional ability 
ог Problem II, but accepted as tenable for Prob 
the three problems, however, ре exceeds Dea » 
might be modified and improved. 


Model A, thus, is rejected 
Jems I and III. For each of 
suggesting that Model 4 


TABLE 1 
Pr Pra Pa Paa Р р! 
Problem I 3/21 = .14 .20 3/5 46 38 AS 
Problem II 0/21 = .00 .20 3/ 00 ‚029 .023 
Problem III 2/21 = .095 .12 2/5 33 .43 .48 


of individual solutions to attempts 


pr = ratio 

ра = ratio of group solutions to attempts 

Фра = estimate of P; from Model A and observation ре 

рад = estimate of Pg from Model A and observation pr 

В = probability (pa — Pa 4) is exceeded by chance under Model A and Pg or Pr 
is replaced by sample estimate 

ip" = approximation of P replacing Рол by kpr 


Stage-wise Solutions 


Within the framework of strict ability models, 


A may be made. Solution of eureka-type problems 
ss at each of several sta 


consequence of pooling succe: 
Shaw’s study, indeed, suggests the plausibility of such a stage-wise model. 


In reporting abou neous moves made by her subjects in solving 


$ the erro 
Problem I she states that 13 different individuals made an error in the 
first move, four ma ade an error in 


r in the third move, and one m: 
the fifth. For groups, however, she reports “No group erred on the first 
move; one erred on th ird and one on the fourth." 
Shaw's description of the errors in Problem I suggests the importance 
failed to make the correct 


of the first move, since 13 of the 21 individuals 
first move. Each grouP, however, apparently had in it at least one member 


a modification of Model 
may be considered the 
ges of the problem. 
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who made the first move successfully since none of the five groups erred on it. 
Once the first move is accomplished, the difficulty of the problem changes. 
Five individuals who made the first move correctly did fail at subsequent 
stages, i.e., made the first correct move but failed at later moves. Two groups 
failed at some later move, suggesting that the group lacked at least one 
member who could accomplish some later move. 

Assuming that a problem is solved т s independent stages, (not the 
moves Shaw mentions, since such moves may be interrelated) and assuming 
that Model A (equation 2) applies at each stage j, then, 


= Пие R= P, (8) 
i= 7-1 

where s is the number of stages, and P,, 

individual at stage 7. Now for the ригр 

data, consider the assumption that Py, 


Pr: = Ру, then 


is the probability of success for an 
ose of estimating s from the Shaw 
is the same for each stage; thus 


Po = [1 — (1 Prr, (5a) 
This assumption may possibly be unrealistic, but it is necessary to provide 
an estimate of s from Shaw’s data. 


Substituting the estimates for Pg and P, from Shaw’s Problems I 
and III, s = 2 (to the n i 


m II is zero, s is indeterminate. (If for Problem II, P; is 


1 е › ап estimate even larger tj 
required in some of the $ i 


З l possible pair: alues 
Ре and p, have not been considered п sio inel 
value of s inconsisten 
Steps or Stages. 


Model B 
On both a probabilistic and 
be reasonably inferred; 
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Population Type Ability Proportion in the 
Population 
Хү Solve both stages Pi 
Xs Solve stage 1, not stage 2 Ps 
Xs Solve stage 2, not stage 1 Р; 
х‹ Solve neither stage Р, 


Assuming this multinomial distribution of ability, appropriate ability 


interaction within a group of four individuals can accomplish a solution 


even though the group has in it no one member who can solve the problem 


as a whole; for example, the group whose members symbolically are repre- 
sented as X» X4 Хз Хз. Consider all possible samples of four (X; X Xi Xu) 
from this population. It is possible to enumerate all groups of four that can 
interaet to accomplish whole solutions solely by pooling their abilities. Any 
group containing at least individual X; , or at least individuals X, and X; 
jointly, will be successful. The probability of occurrence of each sample of 
four is given by the multinomial distribution if P; , P, , Ps , Ps are known. 
The sum of probabilities of the occurrence of each group of four that can 
complete a stage-wise solution is the probability of a group solution on the 
hypothesis of stage-wise pooling of ability. Thus, under Model B, the prob- 
ability of a group solution is obtained by a special summation of the elements 


of the multinomial distribution. MM 
Currently, not enough knowledge is available for estimating all the 
probabilities P; , P, , P, , and P, . At best, in line with current, knowledge 
of the distribution of ability, the psychologist can merely supply reasonable 
estimates for P; , Рз, and P, . In Shaw's data, P, can be estimated from 
the sample. This still leaves two degrees of freedom for choices since the 
su „ probabilities is one. u 
- Cli dis two free choices are subject to the restriction а they 
closely reproduce Pe and that they are not inconsistent with psycho pean 
knowledge of the distribution of ability. For the kind of нЕ treate 
by Shaw, psychologic ndicates that the qu о m 
who will fail on both stages will be larger than the percentage ki shes esa 
both stages or any one stage. This, of course, does not и k > „=. a 
the four parameters but it is interesting to see tist —— 53 Sex 
exist. For example, if in Shaw's Problem I, Pr = -15 (pr а h ES Ys 
Р, = .15, and Ps = .55, then ров = .61 as contrasted T P n dà d 
Pa, P, , and P, were guesses to reproduce the observe E т „де = 
not inconsistent with the yi cwn of ien pode ms ly ta 
it w idered necessary i 
га ү е са already been taken to reproduce à 


vay has re 

Lage red гоша not alter any decisions about 
sample value. Moreover, 1 
the P;’s. This argument als 


th bleness of 4 «Q7 o applies in the following 
e reasonablenes i à р 
discussion of Problems П and Ш. Incidentally, Р, Я 


al evidence it 


= .15 leads to 
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P, + P, = P, + P, = .30; this indicates that the probability of an individual’s 
success in stage 1 and stage 2 is .30. By (5a), Pe = .58 as contrasted with 
Pe = .60, which suggests that the assumption which yields (5a) from (5) 
is realistic after all. 
Moreover, if in Shaw’s Problem LP, = 15, Ру = 30, P, = .30, and 
Р, = .25, a situation definitely inconsistent with the distribution of ability, 
we get рс, = .92, a value noticeably different from Pe = .60. 
Similarly for Problem III, if P, = 10 (р, = .0952), P, = P, = .10, 
Р, = .70, then pg, = -42, as contrasted with Ре = .40. Also referring to 
(ба), Ре = .35, as contrasted with Pa = .40. In Problem II, P, = 2, P, = 
P, = .05, and Р, = 70 yields pg, = .61, as contrasted with pe, = .60; 
again referring to (5a), Pg = -46, as contrasted with Pe = .60. It should be 
i es from the use of p,, = .2, which 
were р; . Notice that this is reflected 
3 = 0 is a one-stage model. Substitituion 
would yield nonsensical results. This 


It is interesting to note the premium gained b 
Model B can be made to ac 


I, and .077 for Problem III; these 
в — Pos) = .14 for Problem I, and .07 for Problem 


TABLE 2 
Problems 
I п ш 
а о 
Ре .60 .60 .40 
Pos .46 .00 .83 
Por .61 .61 .42 
Pi 15 20 10 
Р, .15 .05 .10 
Р, #15 .05 .10 
Р, +55 .70 .70 


Pr = ratio of individual в 
Ре = ratio of group soluti 
Poa = estimate of Pg from Model А and 
Paa = estimate of Ро from Model B and 
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of .017, but this is 


III. For Problem II, the weights used lead to an excess 
by pr, leads to 


just another reflection of the fact that the replacement of рг 
а one-stage problem. 

The stage-wise model hypo 
reproduce the observed pe when reasona 
reasonable weights produce major discrepancies from the observed ре. 
The implication of the model is that group superiority may be conceived 
as a function only of pooling the abilities of its members. Ultimately, empirical 
estimates must be obtained for Р, Рз, and P, . One experimental procedure 


for such estimates would require individuals to solve the problem. For 
those individuals solving the problem 


is for estimating P, . Some individuals 
who failed the whole problem, however, will have accomplished stage 1 
successively but failed on stage 2, providing à basis for estimating P; . The 
remainder, those who cou i 1, would be given the 
problem reduced by the accomplis i 
the requirement that the “new” problem be sol 

will then solve the “new” problem providing а basis for estimating Ps . 
When P, , Pa, P5, and P, are estimated by р: › P2 , Рз, and p, on the 
basis of sample observations, assuming Model B holds, a value Роз will be 
obtained and contrasted with pa - AS in Model A, the probability that an 
e must be computed in order to 


Observed di ill be exceeded by chanc 
деен del. Under the assumption that Model B 


examine the tenability of the mo ssum] at ) 
holds, and replacing P; , Pa » P, , and P, by their estimates, it 18 possible 
to obtain the exact distribution of (pe — pos): although it 1s extremely 

servations, then pes can assume 


tedi ji on n; 0 
ious to compute. If p; 18 based if the sample sizes are 
plus the difficulty of 


(n, + 1): (n; + 1) (з + 1) (m 
small, say n; = 5, Pos takes on 
actually computing the probability d inis de 
technique somewhat useless. Moreover, seen шол ed 


pou seems fruitless because of the special W 

ution i is situation. 

ion is summed for this situation. , erval for Ро , SAY Ро, and Pas 
he sets Pi» P2> Ps » Ps 


. Suppose, however 
15 А етт from Pe - Assuming the model hold» Ж nfidence region 
which yield values between Pe; i ider th 
for P, UP. iy P, . Actually all that need be done Free perm 
value pe, yielded by the observed р: › P: T. Че Ресей confidence с 

«, and Pg, the model is tenable for the SP znificance ү 


for the Si 
employed, let us say 1 — ^ lently fo 


thesizing the pooling of ability tends to 
ble weights are used. Indeed, un- 


or equiva! 


Pooling of Data | 
the three problems, neglectini 
ed the three proble 


g the fact that 
the results for ms in the same 


Shaw pools 
d same & 


sri: s work 
the same individuals an roup 
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sequence. Thus, she contrasts 8/15 or 53 per cent success for groups with 
5/63 or 7.9 per cent success for individuals. Using the z test given by (1) 
with the awareness that the lack of independence renders it inadequate, 
this difference is statistically significant at the 5 per cent level. Moreover, 
since the correlation between observations can be assumed to be positive, 
the decision of statistical significance is on the conservative side. Also, 
Model A is rejected using the z test given by (4). It should be emphasized 
that of the five groups, two solve none of the three problems and two solve 
all. Of the twenty-one individuals, none solves more than one of the three 
problems! Two alternate hypotheses are suggested: 1) Model B is operating; 
2) groups do better than individuals in a sequential solution of problems of 
the same kind. Hypothesis 2 can arise from three possibilities: (a) negative 
transfer in individuals, zero or positive transfer in groups; (b) zero transfer in 
individuals, positive transfer in groups; (c) positive transfer in individuals, 
greater positive transfer in groups. As regards hypothesis 2, Cook (1), using 


two versions of the dise problem (Problem III), varying in diffieulty of 


sequence, implies *that transfer ‘spuriously’ lowers the probability of a 


given individual achieving the same degree of success or failure (relative to 
the rest of the groups) on both problems.” The evidence from Shaw’s groups 
by indicating the plausibility of 
solution of problems of the same 
ascertain the superiority of groups 
suggested by this combined evidence. 
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to design automata to display any explicitly 


' isa convenient elementary 
Previously described 


In principle it is possible 
described behavior. The McCulloch-Pitts “neuron’ 
component for the control mechanisms of automata. 

techniques permit the design of an automaton which would arbitrarily well 
simulate human behavior. The difficulty of producing such a design lies 
primarily in formulating an explicit description of the required behavior. 
The control mechanism of such an automaton would be of very great logical 
complexity. Its mode of operation probably would not resemble that of a 
human brain. The brain is more plausibly represented by stochastic models 
as proposed by Hebb. Such models can more easily be designed or understood 
by reason of lesser logical complexity. A method of computational investi- 
gation of the functioning of such stochastic models is described. Several 
extremely simple models have been investigated. One is shown to have 


properties suggestive of learning ability. 


I. Introduction 

ng or designing devices which can to some 
tterns of behavior of man or the higher 
of behavior favored by an intact nervous 


culation. It seems desirable to continue 
g modern resources to bear 


be shed on the mechanisms 


The possibility of constructi 
extent simulate the complex рї 
animals, particularly those aspects 
System, has long excited lively spe 
such speculation in the present era and to brin 
on the problem in the hope that some light may 
operative in these complex behaviors. 

yield hints of possible 


At best it can be hoped that this approach may i х 
mechanisms; yet, in light of the formidable difficulties of direct studies of 


complex nervous systems, even this modest hope amply motivates such 
investigations. The utility of this approach need not be further stressed since 


many authors have testified to its fruitfulness. А 
A number of lines of argument directed toward the establishment of 
limits on the range of possible behaviors possible for an automaton have 
been explored. Several often-proposed arguments to this end have pen 
shown by Turing (14) to be incapable of clear formulation or onan i 
leading to the desired limitation. These arguments seem to be stimulate 

by a common motive: in the course of a normal childhood а 
person finds it possible to some extent to rede to arde ee BY ү. = z 


observation of the external world, including 
The author is indebted 


oc» Dil So. Laguna, Calif. | 
792 Driftwood Dr, So. Tagen у to Miss Winifred Whitfield 


*Present address: 307: : Dr. 
iscussion and critic! 


to many friends for helpful di 
and Dr. John von Neumann. 
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this orderliness he finds himself possessed of a measure of control. The 
subjectively recognized “self” is somewhat separated from the environment 
in the ordering. Since the analogy between his objectively observed self and 
his companions is too striking to be overlooked, there arises the desire to 
exempt his kind from the observed lawfulness of the external world. (The 
term “kind” by intent lacks precision. Primitive peoples have often extended 
this exemption very generously. The modern tendency seems to be to restrict 
it to our own species or, more narrowly, to one’s own tribe, sex, sect, etc.) A 
view which may be so motivated is aptly expressed by Jefferson (6): “No 
mechanism could feel (and not merely artificially signal, an easy contrivance) 
pleasure at its successes, feel grief when its valves fuse, be warmed by flattery, 


be made miserable by its mistakes, be charmed by sex, be angry or depressed 
when it cannot get what it wants." 


That an upper bound can in fa 
which an automaton could display can be show; 


is finite, however lengthy). He shows that th 
denumerable; thus they are neglibibly few 


us centuries that a quite stringent 
nstructing an automaton to mimic 
i SS of constructable elements of ma- 
chinery—gears, levers, pulleys, ete—in comparison with human size. This 
here, since it is of little importance 
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to the present purpose that an automaton look like a man. In any: case, this 
appearance of limitation is weakened by the present development of semi- 
conductor electrical elements, which promises almost unlimited miniaturiza- 
tion of the types of electronic apparatus which now seem suitable elements 
for the construction of automata. 

Tt is also to be noted that valuable hints to neurophysiology may arise 
from the design of an automaton which, by reason of technical or economic 
limitations, may not be constructed in the metal. However, as the example 
of Walter’s Testudo (16) strikingly displays, the verisimilitude of an auto- 
maton's simulation of animal behavior can far better be judged by direct 
observation of the behavior of the automaton than by study of its wiring 
diagram or differential equations. It thus seems desirable to give preferential, 
although not exclusive, consideration to those designs for automata which 
could be built without extravagant effort. 

Another line of argument bearing on the range of behavior patterns 
accessible to man-made automata has been explored by von Neumann (15). 
It is clear that the complexity of the behavior pattern of an automaton is 
subject to an upper limit dependent on the complexity of its mechanism. 
This complexity in turn would appear to be limited by the extent of human 
ingenuity, which may in turn be similarly limited. One is at first tempted 
to believe that suitable measures of these complexities would permit proof 
of a hierarchic ordering of machines. Further, he might be tempted to believe 
that a machine can fabricate, or in some sense design or conceive, only 
machines of lesser complexity than itself; similarly he might think that a 
man-made automaton must be, in respect to this measure, inferior to its 
creator. It may seem that some degradation of information must occur 
between the construction and operation of à machine. The hope for such a 
theorem is damped by von Neumann’s description, in outline, of a тм" 
capable of fabricating а duplicate of itself after first (this to exclude trivial 
solutions) building a locomotive. The trick of design permitting this д 
tion depends on the distinction between the actual and logical comp lexities 
of a machine. For example, the elaborate set of instructions which governs 
its manipulations may be carried as а perforated t 


ape which, though of 
great actual complexity, can be copied by reiteration of a simple elementary 
operation. 


One might still hope to show it to be impossible for a man to understand 


a device of a complexity equal to his own even though he ош р 
(suitable meanings being given to these terms). Such a чы e Ros 
require comparisons of logical complexity, 50 defined Hus ти Е i» 
markedly with simple ges of Mei К ES ү a и 

1 for example, by & simple cow | [ t 
т given by Turing (13) would seem to i Rud 
this hope. This instrument, of finite logical complexity, 1$ On p 
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predicting the operation of any computer, of however great pie 
'This great reduction in complexity is obtained at the expense of spee d 
operation which, though desirable, can hardly be regarded as of fundamenta 
significance in human understanding. Moreover, an increase in speed г 
often be obtained by duplication of components without increase in logica 
complexity. It thus does not seem likely that arguments along these lines 
can place the desired limit on the range of behavior possible for an automaton. 


IL Neural Network Models 


Automata have variously been conceived as primarily composed of 


of vacuum tubes, relays, etc., (2). 
nents lies in the ease with which 
ctically operative equipment. If 
equired, the advantage lies rather 
cams, detent gears, levers, and 


anufacturable equipment, yet bear 
physiology. 


received effects of the two kinds. The functi 
on the two numbers is subject to choice. 

The mathematical neuron is well adapted to th 
One can, with Surprising ease, design circuits to medi 
activities (3). The chief difficulty lies in formulating 
the desired activity. The process of translating this а. 
can be carried out essentially by rote, It is thus not 
that circuits could be designed to 
behavior of a human adult. It 
suitable interlocks which suppress th 
appropriate environmental cire 
of the argument to follow it will 


е design of automata. 
ate even quite complex 
& precise description of 
escription into circuitry 


umst; 


el 


JP rt tte 
— ; — — a". 
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III. Defects of the Block-Diagram Model 


The neural-network-controlled automaton so designed seems at first 
to the primary question of this investigation. 


to supply a complete answer 
y situation taken into consideration 


It would simulate human behavior in ever 
by the designer; thus, the completeness of the simulation is limited only by 
his patience. Yet for a number of reasons this answer is unsatisfying; this 
block-diagram automaton does not seem to present a close analogy to the 
observed human nervous system as indicated by the following considerations: 


(1) The automaton might include some circuits permitting it, say, to dis- 


play a full command of every modern language and further circuits inhibiting 
their action until environmental circumstances (e.g., exposure to a course 
of study of a language) make the display of each ability seemly. This heroic 
design effort might require the use of most of the 10'^-odd neurons permitted 
by direct analogy, yet would still not provide, for example, for the learning 
of Sanskrit. The view that human ability to learn any few of the enormously 
many known languages is based on the release from inhibition of precisely 
arranged circuits is strikingly unappealing. 

(2) The learning ability of the higher animals seems quite unsystematic 
in comparison with that of an automaton designed in this way. For example, 
а man's capacity to learn to drive an automobile or a таё to learn a T-maze 
can hardly be ascribed to the pressure of natural selection in the short period 
since the introduction of these features of environment. They rather suggest 
the operation of an unspecific learning ability operative in а wide range of 
circumstances. An automaton which mimics the behavior of a laboratory 
rat by means of many marvelously contrived circuits, each initially frustrated 
by an equally marvelous inhibiting circuit, suggests great virtuosity but not 


true efficiency of design. 

(3) If the block-diagram au 
to the number of neurons used, th 
result from the extirpation of as 


striking. Some of its possibilities о р 
would spring forth full-blown as their inhibiting mechanisms аг 


without the normally required training program. Some abilities previously 
acquired by training would be irrevocably lost. If the control circuits are 
redundantly designed, reliability of operation being achieved at the cost irs a 
many-fold increase in the number of neurons used, similar effects WO 
follow the destruction of larger parts of the control circuits. Pe. 
The effects of cerebral injury in the mammals present а quite dues 
picture (10). The resulting changes in behavior are often mud mi = 
suggesting considerable redundancy of design; to some extent t E с Ta 
are reparable by retraining. These changes are, from the n zi E 
view, uniformly pejorative. [As Wiener points out (17), prefrontal 10 у 


у designed with respect 
е modifications in its behavior which would 
mall fraction of its neurons would be most 


{ learning would disappear, other abilities 
e inactivated, 


tomaton is efficientl 
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may usually be expected to increase a patient’s tractability, not his wit.] 
To find, for example, the victim of a brain accident subsequently in command 
of a new language would occasion surprise. 

(4) The complexity of the ingeniously contrived neural circuit con- 
trolling the behavior of the block-diagram automaton might be comparable 
with that of the intricate interconnection of the 10'°-odd neurons of a human 
central nervous system. The latter, however, is presumably built to a pattern 
held in the 10*-odd genes controlling human heredity. [The number of human 
gene positions has variously been estimated as 25,000 to 100,000 (12).] These. 
must also be presumed to determine the architecture of other than neural 
tissues and much of intracellular physiology as well. Thus, it seems likely that 
the genes serving to determine the circuitry of the human central nervous 
System number no more than a few thousand. This consideration suggests 
that the essential logical complexity of the human nervous system is far less 
than the maximum which the number of neurons and synaptic junctions would 
permit. 

^ 


IV. The Hebb Model 
A very different model of a 
present oblique approach, but 


study of actual nervous systems. For this 
Й to regard his description as that of а 


mposing his model are given 
he human nervous system are 


ther than in a few synapses. Its 
interplay of stimulations, periods 
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wem to its basic element than does the block-diagram model. 
ы = Men simplicity of the Hebb model lies in the bold pee т 
fete in sape of the neurons are for the most part not planned. 
о Р ERN cor produced in vast numbers by а broadcast 
DC pé у successive cell division) and to position and interconnect 
е нна са Ww ay which is determined by design only as regards gross 
* 1 eatures. The detailed wiring of the model, in which neurons 

rm synaptic junctions with others, is randomly determined. Here and 


in wh rm ^ i i 
what follows the term “random” is used in the lay sense—unplanned, 


nondescript, determined by happenstance—rather than in the broader 


a aap sense. The Hebb picture of the cerebral cortex may be likened 
Sai ee к of E forest. The complex matting of roots is not the 
Red: са iculous engineering but of the chance placement of the grains 
nda rhe rops of water, etc., which influenced the growth of each of the 
Eom = re are architectural features—tree roots by and large g0 deeper 
E Е p» the precise configuration of roots is not subject to 
E is в not meant to suggest that the growth of a grass root, or of an 
P , is exempt from causality, but only that myriad other configurations 
roots would serve as well to nourish and support the forest. 
а 3 essence of Hebb’s discussion lies in the observation that a large 
үт niet of neurons must be presumed to include many circular 
hee oratory) chains, capable of sustained activity when once excited. 
Re Maser aa frequently excited by particular combinations or con- 
t 2 ions of stimuli tend to be fixed by neurobiotaxis and may be evoked 
р. progressively smaller aliquots of the constellation of stimuli initially 
ba in The first-formed elementary reverberations will interact among 
emselves to form higher-order associations and combinations, thus leading 


to a complex hierarchical structure. 

In a network of M. P. neurons provided only with excitatory inter- 
connections, a stimulus can more readily excite an appropriate response 
than suppress other inappropriate responses. [Von Neumann (15) has shown 
that networks of M. P. neurons can be given full logical universality with- 
out the use of inhibitory interconnections. The device used, however, does 
not lend itself to use in & random model.] The Hebb neuron, 


н unlike the М. 125 
displays a significantly long refractory period. This makes possible the 
suppression or inhibition of reverberatory activity by excitatory processes 
alone. If two reverberatory chains share the use of a number of neurons, the 
excitation of one reverberation may, by fatiguing shared neurons, tend to 
the other. Similarly, any strikingly intense, widespread 
work, which may be identified with painful stimuli to 
k up the current large-scale pattern of activity. 
disruption of over-all patterns of activity by 


Hebb’s model looks to this 
асгозсор!е (goal-directed) learning. 


intense stimulation to effect m: 
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V. Possibilities of Computational I nvestigation 


Hebb's extensive qualitative discussion has the aim of making plausible 
the view that the impressive abilities of a human nervous system are expli- 
cable with this (conceptually) simple set of hypotheses. The above discussion 
is intended as a description of this aim, not as a summary or as a critical 
review of Hebb’s argument. The success of such a plausibility-proof must 
be judged by each reader for himself. I find it profoundly convincing. 

The difficulties which stand in the way of a firm analytic proof of the 
adequacy of the Hebb model are highly formidable. These do not, as might 
first be thought, have primarily to do with the enormously large number of 
chaotically interconnected components. Statistical techniques for investi- 
gating the properties of such assemblages are reasonably well developed. 


to preserve reasonable verisimilitude, a 
cription of, say, 
described pattern 
30 small as to permit carrying out, with 
akes each region separately into account, 


er requirements may considerably simplify 
nteligence. If carried to completion, how- 
t the display of any goal-directed learning. 
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с к ааа oe to display shrewdness in a 
rt plific e model should thus stop short of 

e removal of all affective inputs. It may suffice, however, to leave one 
роса feature of the input which, in the interpretation of the 
ehavior of the model, plays the role of a generalized indication of pain 
о alternatively of pleasure). Each particular environment of the model 
wi be specified by а functional dependence of the inputs to the model on 
its (present and prior) outputs, i.e., its experiences are at least in part de- 
termined by its behavior. The extent to which the behavior of the model 
serves to diminish (or, in the alternative case, to increase) the frequency of 
oe of the goal-associated input is then a measure of the goal-directed 
earning displayed by the model in each environmental situation. 

^ Even with these simplifications it is not clear that the present techniques 
of mathematical analysis permit a more penetrating study of the expectable 
performance of stochastic neural network models than that to be found in 
Hebb's qualitative discussion. A more promising approach would seem to be 
offered by the use of a modern general-purpose digital computer to simulate 
the behavior of specific examples of the model in specific environmental 
situations. Some loss of mathematical rigor is unavoidable in this method 
of study since the performance of a few haphazardly selected examples would 
be taken as typifying the performances of the enormously large ensemble of 
possible realizations of each model, the details of which are randomly speci- 
fied. The danger of being misled by a fluke performance is, however, no 
greater than that which occurs in most stochastic investigations and admits 
the usual statistical safeguards. This technique of investigation of a highly 


multidimensional ensemble by sampling, known as the Monte Carlo Method, 
has been investigated by Ulam, von Neumann, and others (9). In some 
tably efficient (18). The study of the 


applications this technique proves no 
Hebb model and other stochastic neural network models appears to be such 
an application. 

VI. The Three-Layer М odel 
model, the author initiated the 
k model composed of M. IB 


neurons (17). In this three-layer model the neurons are distributed upon à 
surface in three classes (layers) serving particular purposes. One, the trunk 
layer, was to transmit to all parts of the surface notice of the reception 
anywhere of a painful stimulus. Another, the granular layer, was to record 
certain special events by the initiation of spatially localized reverberations. 
These reverberations may be extinguished by the passage of a wave of 
excitation along the trunk layer. They thus provide à pain-limited temporary 
memory of the occurrence of the special events which initiated them. In the 
third, the primary layer, the neurons are interconnected by long-range pro- 


acquaintance with the Hebb 


Prior to his 
c neural-networ 


investigation of a stochasti 
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cesses, unlike those of the trunk layer and granular layer, which have only 
local connections. The simultaneous firing of two neighboring neurons of the 
primary layer constitutes the special event to be recorded in the granular 
layer. A reverberation in the granular layer is to produce a progressive 
increase in the sensitivity of neurons of the primary layer near to the neurons 
which initiated the reverberation. 

It was hoped that by appropriately specifying the statistical structure 
of its neural interconnections the network could be shown to display proper- 
ties suggestive of learning ability. This learning ability of the model may 
be expected to increase indefinitely (at least in some quantitative sense) as 
the size, but not the logical complexity, of the network is increased, hence 
without increase of the ingenuity invested in its design. 

It proved easy to show the desired properties for the trunk and granular 
layers (4). The trunk layer neurons were densely interconnected and given 
an appreciable refractory period. These parameters could be widely varied, 
still permitting the propagation of a wave of excitation. The neurons were 
arranged in a rectangular array, opposite edges of which were regarded as 
contiguous so as to make the surface topologically of spherical or, more 
conveniently, of toroidal character. In this way disturbances owing to 
atypical characteristics at boundaries were avoided. The wave of excitation 
would spread over the entire surface leaving behind 
upon reconverging to a point, be extinguished. 

A few possible structures for the granular layer wi 
performance of the layer was found to be favored by 
period and by considerable statistical fluctuation in t| 
connectivity; hence the name granular. 
many independent local reverberations 
spreading of excitation. An extreme, 
neuron unit threshold and one self- 

The study of the properties of 
greater difficulty and seemed to r 
of greater speed and flexibility tha 
at that time (1949) that an elect 


a refractory zone and, 


ere examined. The 
a short refractory 
he degree of local 
This permits the maintenance of 
while preventing any long-range 
perhaps trivial, solution is to give each 
exciting process. 
the primary layer presented considerably 
equire the use of calculating equipment 
n was readily available. It was proposed 
ronie calculator of considerable memory 
capacity be used in the further 


1 exploration of a three-layer model, with 
attention focused chiefly on the primary layer. 

In the summer of 1951, through the generosity of the National Research 
Development Corporation and of 


the Department of Mathematics of the 
University of Manchester, the author had opportunity to initiate this ex- 
ploration with the use of the Manchester Mark I computer. The results of 
the calculations made at that time are described below. Although they are of a 
qualitative and preliminary natu 


te they support the suggestion that this 
technique of investigation is likely to prove fruitful. 
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VII. Manchester Calculations 


Pic! his Ш a ааг of computational simplicity, to make no 
oe үг R ici d e neurons of the granular and trunk layers 
l pposed effects on the thresholds of excitation of the 
primary layer neurons in the rules governing the computations. 

In a first series of calculations planned, a large number of neurons was 
to be represented. Each was to receive excitatory processes from a fixed or 
variable number of others selected by а random process. To avoid over- 
burdening the rapid-access memory capacity of the computer, it proved 
convenient to represent the series of random numbers which describe the 
interconnections of the neurons by an algebraic formula from which they 
could be repeatedly calculated rather than to store the series in the memory. 
These numbers are thus not truly random, but have sufficient complexity to 
be considered quasi-random, 1.е., sufficiently disorderly for the purpose. 

In each cycle of the caleulation the computer determines and displays 
which of the neurons fire; information for use in the succeeding cycle is 
recorded. The firing of a neuron is determined by the number of neurons, 
among those from which it receives excitatory processes, which fired in the 
preceding cycle. If that number equals or exceeds its assigned threshold 
it fires, otherwise not. Again to avoid overburdening the computer memory 
it was at first planned to take no account of the number of cycles elapsed 
since the neuron last previously fired, i.e., not to assume à refractory period 
exceeding one cycle. Modifications in the behavior of the network were to 
be effected by changes in the thresholds. 

As a preliminary to these experiments а series of calculations was carried 
out to determine suitable ranges of the number or mean number of excitatory 
processes brought to each neuron and the constant or mean threshold. It 
soon became evident that no suitable values of these parameters could be 
found. For any value of the threshold exceeding unity the level of reaction 
was intrinsically unstable. The number of neurons firing either fell to zero 
after a few cycles or rose to almost the full number of neurons represented. 
An elementary statistical calculation shows that this behavior is to be ex- 


pected in the models as tried. 

Two methods of overcoming this difficulty p ) 
is the use of neurons with definite refractory periods considerably exceeding 
the synaptic delay time, i.e., many cycles of the calculation. This should tend 
to depress the upper stable level of reaction by reason of the refractory 
condition of most of the neurons at each cycle. This procedure seems un- 
attractive, since the firing of à neuron would then depend chiefly on its release 
from inhibition rather than upon the immediately preceding pattern of 


firings. 


resented themselves. One 
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The second method of stabilizing the level of reaction is logically appeal- 
ing though seemingly unphysiological. It is to use only inhibitory rather 
than excitatory interconnections of the neurons. This gives the level of 
reaction negative rather than positive feedback characteristics, thus producing 
a single stable level of reaction. This procedure avoids the necessity of 
maintaining in the computer memory a record of the number of cycles 
elapsed since the last previous firing of each neuron, as would be required if 
long refractory periods were taken into account. It was accordingly decided 
to adopt this latter procedure. 


VIII. The Linear I nhibitory Model 


Having taken this one step away from physiological plausibility, another 


became appealing. The complexity of the computational procedure can be 
considerably reduced by limiting the group of neurons to which each neuron 
may be responsive. Accordingly, it was decided to represent the neurons of 
the primary layer as arranged in a circular sequence and to select the neurons 
sending inhibitory processes to each neu 


among its forty predecessors, each of these bein 


ty ec g included or excluded with 
equal probability. Initially each 


robab neuron was given a threshold of inhibition 
of five, i.e., if five or more among its twenty-odd selected predecessors had 
fired it did not fire; otherwise it did. Arrangements were provided to replace 
this determination by selected firings of so 

environmental stimuli. This arr: 


" 
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en firing or not of the forty preceding neurons. Taking into 
he fact that in the mean only one-fourth of the neurons fire this 
represents a store of only 3314 bits of information. It is therefore to be 
expected that the appearance of randomness of the firing pattern is not 
deep-seated. 

| As a first simple learning experiment it was decided to promote the 
firing of a particular neuron, making no use of the input mechanism repre- 
senting environmental stimuli. A neuron was chosen which, prior to the 
learning experiment, fired in very nearly one-fourth of the cycles. Its firmg 
was taken as the criterion of a successful cycle. In the learning experiment 
the model was run as initially set up until the first cycle in which the selected 
neuron fired. The model was then rewarded as described above, i.e., each 
neuron which had fired in that cycle had its threshold raised to six. On 
superficial examination the experiment seemed strikingly successful; there- 
after the selected neuron fired in every cycle, although no further reward was 
supplied! On closer examination, however, it appeared that the structuring 
of the firing pattern resulting from the reward was excessive. After the 
reward the model fell into an immutable pattern, each neuron either fired 
ш every cycle or in none. This fixed pattern was similar to that which occurred 
in the rewarded cycle but not identical. Thus, if the selected neuron had 
hich the two patterns differed, the experiment would have 


been one for w 
the model would have learned the opposite 


seemed totally unsuccessful; 


from the intended behavior. 
The result of this experiment is unsatisfactory in another way. Since 


the firing pattern was fixed by the first reward, the model was not capable 
of further learning. A milder form of reward would presumably have been 


preferable in diminishing the likelihood of fixing the wrong pattern of behavior. 


Tt is also clear that a satisi uch greater stock of 


factory model requires à Mı 
randomness in its initial behavior. 

This result suggests that a learning mechanism may need to be guarded 
against excessively rapid learning, which could lead to its leaping to un- 
justified conclusions. The optimum learning rate would seem to be determined 
by the opposing hazards of accidental learning, brought about by statistical 
fluetuations and accidental correlations of actually unrelated things, and 
the dangers (depending on the environmental circumstances) of learning 
too slow. It is interesting to speculate on the effect of the considerable 
increase in the prevalent life span W 8 experienced 
in the course of its last few thousand generations. It seems possible that 
this has made more prevalent the defect of human intelligence that arises 
from too-rapid learning. It may appear that the typical mentally maladapted 
individual has not 1еатпе too little but rather too much. that is not true. 

Tt has not as yet proved possible to test the behavior of this model as 
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| 


modified in the ways suggested by this result. It is the author’s hope that 
the suggestion of success shown by this model will serve to stimulate similar 
investigations in some laboratories having electronic computing machines. 
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BOOK REVIEWS 


J. P. Сошғовр. Psychometric Methods. (2nd Ed.) New York: McGraw-Hill, 1954, рр. 

ix + 597. 

Psychometric Methods has been renovated and enlarged for its second edition. Although 
it retains the character of its 18-year old predecessor, the new edition includes some major 
changes. An introductory chapter on measurement theory has been added. Most of the 
statistical topics have been removed, including the old chapters on simple and multiple 
correlation. The treatment of psychophysics and scaling has been improved by forming 
chapters on psychophysical theory and on principles of judgment from material formerly 
f the specific methods. Psychological testing now occupies 


interspersed in the descriptions o 
three chapters instead of one. Finally, the problems at the end of each chapter have been 


revised and are accompanied by answers whenever practicable. 

The book begins with a lucid account of the logical basis of psychological measure- 
ment, and a discussion of nominal, ordinal, interval, and ratio scales. This is followed by а 
comparison of the classical psychophysics of Weber ratios, difference limens, and Fechner’s 
law, with the modern psychophysics of discriminal dispersions, the law of comparative 
judgment, and stimulus-response matrices. (S is used for stimulus and R for response, 
instead of the awkward reversal perpetrated by the Germans.) The third chapter covers 
mathematical functions, curve fitting, and probability distributions. The major psycho- 
physical methods and sealing methods are covered in the next seven chapters. Included 
are the methods of average error, minimal changes, constants, pair comparisions (Guilford 
prefers pair to paired), rank order, equal sense distances (bisection), equal-appearing 
intervals, fractionation, constant sums, and successive categories. Experimental designs 
and computational procedures are illustrated for each method, the pros and cons are dis- 
cussed, and variants of the methods are noted in short paragraphs. Short sections are also 
devoted to allied problems, including multidimensional scaling, the objectivity of judg- 
ments, and the prediction of first choices. The scaling material concludes with chapters on 
rating scales and principles of judgment. The latter includes discussions of judgment times, 
the time-order error, anchoring, judgment sets, regression phenomena, and Helson's concept 


of adaptation level. 

The field of testing requires three 
a detailed account of the theory Базе 
mention of some new additions or alternatives propos 


long chapters. The discussion of test theory includes 
4 on independent true and error scores, and brief 
ed by Lord, Loevinger, Ferguson, 
and others. Speed and power problems and scoring problems are discussed. Reliability, 
validity, and item analysis are treated at length. A brief account is given of attitude scale 

t includes discussion of general 


i i lysis. I 
construction. The final chapter 18 devoted to factor analy u ч a 
issues in factor analysis, and provides detailed recipes for centroid factoring and graphic 


otation of axes. 3 «4 р 
at d is not possible to encompass all of present-day psychometries in а single volume. 
d to include most of the popular techniques, and has provided 


However, Guilford has manage x н ues 
ех ишу references for those who want supplementary information. In the areas of psy cho: 
here references are scattered and reviews are few, Guilford's treat- 


hysies and scaling, W: à e 
Past audi hmc than in the areas of testing and factor analysis, where other good 
ble. There was obviously not space enough to include all of the new 

method, probit analysis, and Coombs' general 


summaries are availa A. 
i i . The up-and-down à 
imn piae ar: x tended treatment than they receive. In general, 


approach to scaling all deserve more exi 
though, Guilford’s coverage is excellent. 
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Guilford’s treatment of the method of successive categories is the weakest section 
the book. In contrast to the usual clarity of exposition, this section is fuzzy and very in 
to follow. (There are some printing errors to add to the confusion). Some of the detai s are 
right, others are wrong, or at least dubious. His method for estimating category = 
ог limens—is standard, but then he suggests locating the stimuli by finding the interpolate 
medians of the judgment distributions on this scale of limens. According to him, means are 
harder to find, and in either case, trouble arises when judgment distributions are truncated, 
ie. when many judgments are in an extreme category. Actually, when appropriate pro- 
cedures are used, the stimulus means are easy to determine, and the method is indifferent 
to truncation. The basic difficulty with the presentation is that the successive categories 
model is never stated explicitly. In fact, Guilford seems to reject the model when he argues 
that the categories themselves should somehow be scaled rather than the boundaries 
between categories. His procedure for scaling the categories makes no sense to this reviewer. 


Since the method of successive categories has great utility, this section of the book is 
especially disappointing. 


Sensory psychology. 
ection in vision and 
ho now find psychophysies 


on, masking, and target, det 


ight interest students w 
dull. 


In a book of this sort, to introduce formulas magically, 
either because there is not s аз, or because the development would 
be beyond the mathematic iliti 


. In Psychometric Methods magic is 
Several places where a few words of explanation would 


for tests, Guilford states that R 
student realizes that ri 


a straight line 


i y algebraic discussion is 
likely to seem magical. Eebraie discussio 


y wed exam; ichardson formula 20, 
which is w m p's can be inserted. 
It is not s 


quivalent to K. R. 20. 


ated in the text that Тиске 
lies a lack of equivalence, A 


in the numerical examples, the 
formulas. Three are wi i e. In formula (10.7), p. 253, 
f 1 ; the > in the denominator should 
; n formula (16.1 › р. als indicating the fourth root 
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deviates and ordinates for various values of the area, is especially valuable and can be 
found almost nowhere else. 
The major changes in Psychometric Methods are in scope and organization. The reader's 
саш of the second edition сап be predicted accurately from his estimate of its pre- 
ecessor. 


Massachusetts Institute of Technology Bert Е. Green 


С. Карналквіѕнхл Rao. Advanced Statistical Methods in Biometric Research. New York: 
John Wiley and Sons, 1952, pp. xvii + 390. 


This book should be of much interest to social scientists and other investigators who 
are so often confronted with data requiring multivariate analysis of one kind or another. 
The style, which presupposes a working knowledge of elementary statistics, is a combination 
of terse mathematical statement followed by examples, mostly from the fields of anthro- 
PNG and genetics. A psychological application (p. 316, p. 370) is of importance, for it 
in а the solution to the problem of “types,” under the restriction that measurement 

persion matrices for the types are identical. 
"Rex first chapter neatly summarizes that part of matrix algebra most useful in sta- 
à ] including quadratic forms. Also, the technique of pivotal condensation for evaluating 
eterminants and matrix inverses is first discussed here, and throughout the book the 
value of this method for simplifying computations is ably demonstrated. 

The second chapter gives statistical distributions in common usage, followed by the 
multivariate distributions required for tests treated in subsequent chapters. Some practical 
insight into the use of distributions for constructing multivariate tests is provided by this 
chapter and Chapter 7. 

The remaining chapters are oriented toward testing hypotheses, with adequate 
emphasis on cases where variates are correlated. The last three chapters contain, with 
only minor revisions, the author's previously published contributions to the theory and use 
of classification, or discrimination, functions. The value of this work for psychologists and 
anthropologists can hardly be overemphasized. 

Chapter 4 contains an interesting and original presentation of maximum likelihood 
estimation, where the Fisherian concepts of efficient scoring and amount of information in 
Scores are illustrated. Chapters 3-6 contain useful sections on many of the traditional 
problems of inferential statistics. Analysis of variance is discussed only briefly, but a new 
technique for obtaining an interaction sum of squares is given, and a problem requiring 
classical analysis of covariance is fully illustrated. (Generalized analysis of variance and 
Covariance, or “analysis of dispersion,” is treated in Chapter 7.). The sections on chi-square 
are clear and relatively complete; for example, included is the evaluation of 2 X 2 tables 
With more than one degree of freedom, the use of Dandekar's (instead of Yates’) correction, 
and a more exact approximation to the normal distribution than that obtained by using 
М 2х2 — y/n — 1. An equation on page 197 is incorrect, but in the context the slip is 
obvious to the reader. 

Several appendices are included, one of which contains a number of original lemmas 
On classificatory problems; another contains two methods for applying a Schmidt trans- 
formation to obtain uncorrelated variates. 

The main disadvantage of the book lies in the fact that most readers who will want 
to use the methods may find it difficult to make rather abrupt transitions from very general 
Mathematical thinking to concrete applications. In other words, the book may be too 
difficult for those for whom the applications seem most pertinent. Also, psychologists may 
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be disappointed that the final chapter contains so little on factor analysis. The author 
makes use of canonical variates and correlations without clearly relating them to Hotelling's 
method of factor analysis. But such criticisms are unimportant in view of the many re- 
markable contributions so adequately and creatively utilized in this volume. 


Mellon Foundation 


Vassar College Harold Webster 


RaxvwoNp B. CarrELL. Factor Analysis: An Introduction and 
and Social Scientist. New York: Harper & Bros., 1952. Pp. xiii + 462. 
In the Preface of Factor Analysis, Cattell has set forth three principal requirements 
which the book should fulfill. (1) “ 


— to meet the need of the general student in science to 
gain ideas of what factor analysis is about and to understand how it integrates with scientific 
methods and concepts generally,” (2) to serve “as a textbook for statistics courses which 


deal with factor analysis for the first time, either as an appreciable part or as the whole 
of the semester course,” and (3) “—to supply a handbook for the research worker, the 
student, and the statistical clerk which will be a practical guide with respect to carrying 


out the processes most frequently in use.” To achieve these three objectives Cattell has 
written the book in three sections: I Basic 


T i is, I ifie Ai 
and Working Methods, and III General Princi aede р 


Manual for the Psychologist 


torial designs, Moreover, he has 
only in theory construction, but 


^ evaluate th. 


shortcomings are apparent: 


(Eque XA made to explain too many methods of 
omewhat fion actor extraction relative to the limited 
techniques might have been desirable, © extensive explanation of a fewer number of 

(2) The steps involved in the various с} 


Ustering methods do not seem to be easy for 
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the beginner to grasp, since the illustrative examples are not clearly related to the procedures 
described. For example, the explanation of the group method of factoring (pp. 174-8) 
seems to be unnecessarily confusing and ambiguous. The origin of the entries appearing 
in the table at the top of page 176 remains a mystery to the reviewer. 

(3) The format of the computational explanations is such that one cannot grasp ina 
readily-apparent fashion the objectives toward which the writer is trying to lead the reader. 
Paragraph captions or headings would be particularly helpful. In short, each of the steps 
involved in the calculations is simply not clearly set forth for the reader to perceive. Each 
rule or procedural item should be directly related to a specific numerical operation. 

(4) The explanation concerning the rotation process through use of graphs is sub- 
stantially inadequate if the text is to serve as a manual. What is seriously needed is a set 
of graphs to illustrate in a step-by-step fashion the solution of a representative problem 
involving between 10 and 20 test variables. In addition, a paragraph or two in which an 
explanation is given as to why each rotation was undertaken would be most helpful to 
the beginning student. Both orthogonal and oblique rotations should be considered at 
much greater length. Although mastery of the art of rotation requires extensive experience, 
alist of guiding principles that are related to illustrative plots would constitute an important 
teaching aid. 

(5) The presence of numerous errors is particularly annoying and confusing to both 
the beginner and the experienced worker. One rather serious mistake occurs in the equation 
near the top of page 232. Instead of Е, or Vr = VodF the equation should be written as 
Р, or Vp = VAF™, There is some doubt as to whether the geometric interpretation of 
reflection in centroid extraction that is presented on page 54 is correct. Numerous minor 
errors are present. A few examples may be cited: a double negative on page 157, line 9, 
which would not seem to be intended; one numerical entry of 0.37 in line 3 of the second 
Paragraph on page 160 when the value of 0.38 is intended; misplacement of the decimal 
point of the numerical entry in the denominator of the fraction appearing at the bottom of 
Page 160; an incorrect numerical entry (4.45 instead of 4.72) in the denominator of the 
fractions from which m is calculated on page 172; an apparently erroneous value of .10 
instead of about .15 in the second row and first column of Table 26 on page 201; the use of 
Communality when square root of communality is intended on line 17 of page 205; the 
Use of “are” when “is” is intended on the fifth line from the bottom of page 256; an in- 
Correct reference to Table 27 on page 214, and least important the misspelling of the 
Teviewer's name. r 

(6) Much needed is a summary in one location of the matrix equations that are 
frequently employed in factor-analysis studies—a set of 12 or 15 equations that show 
Various interrelationships among the primary-factor, reference-factor, arbitrary-orthogonal- 
factor loadings, the intercorrelations of the factors (both types), and the relationship of 
correlation coefficients to various types of factor loadings. Such a summary would serve 
to unify much of the illustrative material. 

These remarks represent to a large extent a consensus based on the numerous state- 
ments of students who have used the book as a text and upon the comments of professors 
who have either required the book in courses or have attempted to use it as a manual in 
their own research. In short, the third requirement has not been realized. A Р 

Since the book as а manual is somewhat limited in the clarity of its exposition with 
respect to the use of numerical procedures, the second requirement concerning its function 
as a textbook has not been met to an adequate degree. It would appear that the instructor 
in factor analysis would need to require a second text to supplement the content of Cattell's 
Factor Analysis if skills in factoring are to be gained. Students have consistently reported 
that it has been necessary to consult other sources at length to clarify what are essentially 


TOutine steps involved in clerical procedures. 
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One of the most pleasing features of the book is Cattell’s style of writing, which is 
informal and conversational in its tone. His ample use of cleverly devised figures of speech 
such as similies, personifications, and metaphors offers many an opportunity for a smile 
as well as a refreshing change of perspective in the reader's orientation to the field of ab- 
stractions that pour forth page after page. A few examples may be definitive: 

“This business of reflecting, however, can become as exasperating 
three footballs in two hands; for as we make r’s positive as a whole for 
make some individual 7’s in the column negative for other tests (p. 55). 

“The search for common characteristics in the loaded varial 
first hunch as to the nature of the factor is beset by 
very high, and always presents possibilities of being 
say frivolous, example, if two drunken men and two 
and one of the former had had Scotch and soda while 


as trying to hold 
one variable, we 


de persons 
ble to its proper negli; 
(pp. 75-76). 


“and it Ваз frequently happened that a reference vector which has obstinately 
eluded stabilization has been led to a recognizable hyperplane by this method as soon as 
all its fellow reference vectors have bi у 


і ecome sufficiently convincing 1 ir hyperpla 

E ex y cing in their hy perplanes to 
"The points are gradually 
shepherded into a restricted area а, 
In its current form the boo! 
сы Principles of factor an i of factor analysis in experimental design, 
Stren of problems in the social sciences for which factor analysis may be useful. 
vever, as i 
: tual performance of a factor analysis 

th ё e 
e book is of doubtful valu з as а manual, it would be a useful 


being tracked down by 
в are sheep by 
k is an excellen: 


ysis. 
University of Southern California William B. Michael 
Kit of Selected Tests for Reference Apti 
e { plitude and Achievement F, 0 1 i 
Service, Princeton, New Jersey: October, 1954, шынып Э 


Tests in this ki 
for any of the , 


ber of disturbing thoughts 
ttee discussion really well 
З consent with equanimity 


Is the democra: 
of scientific tr 
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to have elements defined in terms of voting and a show of hands rather than in terms of 
decisive proof? Would the inclusion of non-American writers (Vernon, Meili) have decisively 
altered the contents of this kit? These are important problems, but they are not discussed 
in the manual. Perhaps the employment of some such technique as Ahmavaara’s 
(Ahmavaara, Y. Transformation analysis of factorial data. Helsinki: Suomalaisen Tiedeaka- 
temian. 1954) might have helped to objectify judgments. 

Another difficulty will strike many readers. Some of the factors possess a high degree 
of generality, such as the general reasoning factor; others, like the aiming factor, are very 
specific indeed. To put both types, as well as factors of intermediate coverage, into the 
same kit raises problems of specificity and generality which, again, are not discussed in the 
manual. Nor is there a discussion of the very meaning of the term "aptitude" used to 
designate these different types of factors. In the reviewer’s department, tests with high 
loadings on the “aiming” factor have been found to be excellent measures of temperamental 
E to what extent can we rest content with having them treated as pure ability 

ests? 

| Another difficulty that arises is due to the failure of the committees to consider 
evidence from outside the factorial field. To take but two simple examples, we may wonder 
to what extent reactive inhibition (Jp) and conditional inhibition (sr) play a part in the 
performance of dotting, aiming, and tracing tests. To what extent, also, do individual 
differences in inhibition formation determine score? On a rather higher plane we may ask 
about the determination of the results on all the tests included of orectic factors, which 
have been shown (Furneaux, W. D. Some speed, error, and difficulty relationships within 
а problem-solving situation. Nature, 1952, 170, 37) to exert an important and variable 
influence. The fact that none of these objections are discussed or met by French is less 
his fault than that of factor analysts, who in general tend to pay little attention to the 
findings of general psychology in their work. Nevertheless, it does tend to make this collec- 
tion less valuable than it might otherwise have been. It also gives a false feeling of security 
to investigators who may wish to work in this field. 

This brings us to the last point. The kit apparently is intended for research workers 
who may wish to use these tests as reference markers. It is difficult to see why in this case 
the actual tests have been included with the manual. Research workers in any case will 
have to obtain sets of tests which they wanted to employ, and they would certainly be 
expected to be familiar with the current literature and the tests included in the kit if the 
research to be done were to be taken seriously. For the purpose of ensuring the use of 
reference markers, therefore, the manual itself would have been quite sufficient. It seems 
to the reviewer that the main use of the kit will be, not for research workers, but for in- 
structors who wish to show their students illustrative tests of the main factors isolated 
by factor analysts. For this purpose the kit is admirably selected and constructed; it is 
to be hoped, however, that in his discussion the instructor will not forget to include some 
of the matters raised in the first few paragraphs of this review. 


Department of Psychology H. J. Eysenck 
Institute of Psychiatry 
University of London 


sin GODFREY THOMSON 


Sir Godfrey Thomson 


It was at the International Congress of Psychology, 1923, that I first met 
Godfrey Thomson. We were in the same symposium on the nature of intelli- 
gence. In correspondence and in personal conferences I have found him always 
friendly and intellectually generous even when we did not agree in our psy- 
chological interpretations. I always read his criticisms with interest and 
respect. An outstanding characteristic was that he never falsified a problem 
in order to win an argument—a trait that was not shared by some of his 
adversaries in the controversies of mental measurement. 

Godfrey Thomson was born in Carlisle, England, on March 27, 1881. He 
was educated at Rutherford College, Armstrong College (now King’s College), 
the University of Durham, and the University of Strasbourg. At Armstrong 
College he was Open Exhibitioner, Junior Pemberton Scholar, and Charles 
Mather Scholar. Later he was appointed Pemberton Fellow of the University 
of Durham, where he obtained the M.Sc. degree in mathematics and physics. 
Following this he attended the University of Strasbourg in Germany and was 
awarded the Ph.D., summa cum laude, in 1906. 

At this point his interest turned from the physical sciences to psychology 
and he returned to the University of Durham for postgraduate study in that 
subject. After receiving the D.Sc. degree in Psychology in 1913 he accepted 
the position of Lecturer in Education at Armstrong College. In 1920 he be- 
came Professor and Head of the Department of Education; he held this posi- 
tion until 1925. During this period he visited the United States as Visiting 
Professor of Education at Columbia University, 1923-24. A second visit to 
this country was in 1933 when he was a lecturer in the Yale Summer School. 

From 1925 until his retirement in 1951 he held the joint post of Professor 
of Education at the University of Edinburgh and Director of Studies, Edin- 
burgh Provincial Committee for the Training of Teachers. In 1939 the Uni- 
versity of Durham awarded him an Honorary D.C.L. Later he was awarded 
the Order of Polonia Restituta (third class) by the Government of Poland in 
exile, and in 1949 he was knighted. Sir Godfrey Thomson died in Edinburgh 
on February 9, 1955, at the age of 73. 

Godfrey Thomson was a fellow of the Royal Society of Edinburgh, of the 
Eugenics Society, and of the British Psychological Society, of which he was 
president, 1945-46. He was an Honorary Fellow of the Educational Institute 
of Scotland, and of the Swedish Psychological Society. He was a member of 
the British Association for the Advancement of Science, the National Insti- 
tute of Industrial Psychology, the International Statistical Institute, and of 
а large number of boards and foundations. 

Sir Godfrey Thomson had many connections with scientific societies in 
the United States: Foreign Honorary Member of the American Academy of 
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Arts and Sciences, Foreign Associate of the United States National Academy 
of Sciences, Fellow of the American Association for the Advancement of 
Science, member of the American Institute of Mathematical Statistics, and 
member of the Psychometric Society. 

He devised tests of intelligence and achievement which were well-known 
and widely used in the British Isles and throughout the Commonwealth. 


With the profits from the sale of these tests he founded scholarships and en- 
dowed the Godfrey Thomson Lectur 


eship in Educational Research in Edin- 
burgh University. 

Sir Godfrey Thomson's work in men 
three successive periods. First he w. 
beginning in 1911. His work was p 
ment by Brown and Thomson and i 


Psychometric Laboratory 
University of North Car lina L. L. Thurstone 


м е Ethie 
i, iis = 


А 
1 PSYCHOMETRIKA—VOL. 20, No. 3 
SEPTEMBER, 1955 


A GENERALIZED SIMPLEX FOR FACTOR ANALYSIS* 


| Louis GUTTMAN 
THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH, 
JERUSALEM, ISRAEL 


By a simplex is meant a set of statistical variables whose interrelations 
reveal a simple order pattern. For the case of quantitative variables, an 
order model was analyzed previously which allowed only for positive cor- 
relations among the variables and a limited type of gradient among the 
correlation coefficients. The present paper analyzes a more general model 
and shows how it is more appropriate to empirical data. Among the novel 
features emerging from the analysis are: (a) the "factoring" implied of the 
correlation matrix; (b) the use of a non-Euclidean distance function; and 
(c) the possible underlying psychological theories. 


In a new approach to factor analysis, called radex theory, it has been 
Shown (3, 4) how two important special cases arise: the simplex and the 
circumplex. Only a restricted case of the simplex was considered parametrically 
In (3), allowing only positive correlations among the observed variables and 
only a limited type of gradient among the correlation coefficients. The purpose 
of the present paper is to give a parametric theory and analysis of a more 
general type of simplex. In this generalization, a more flexible gradient is 
Possible, and negative correlations can appear as well as positive ones. Thus, 
"inhibiting" as well as “reinforcing” factors can be considered. Generalizing 
the parametric system for a simplex immediately suggests analogous generali- 
Zations for the cireumplex, and hence also for a compiete radex. We shall 
Consider here only the simplex, and it will be clear what the implications 
are for the cireumplex and radex. 

As in conventional factor analysis, we consider a universe of tests for 
а population of subjects. Both the universe and the population are usually 
theoretically indefinitely large, and in practice only a finite sample is drawn 
from each. It will be convenient to consider a finite battery of n tests from 
the universe, but to consider the population of testees to be infinitely large 
so that we need not be concerned with sampling error due to people. We 
Shall then be able to see what happens as 7 increases. 

A particularly curious result of the present analysis is as follows. It turns 
out that in terms of ordinary factor analysis, one should factor not the co- 


I. Introduction 


*Read at the International Congress of Psychology, Montreal, June 7-12, 1954. 
This тезш эш facilitated in part by an uncommitted grant-in-aid to the writer from 
the Behavioral Sciences Division of the Ford Foundation. 
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і i ў f this 
variance matrix of our generalized simplex, ЕЕ Dd. 
ing implied i kinds. First, the firs 

ix. The factoring implied is of two , th be a 

с queda of Thurstone—should be factored from the inverse ver eese 

then the principal components should be taken as the remaining n isis spe 

i i lution turns ou 

i rtieular way of regarding the factor reso ) н 

кл зң theoretical and practical implications for the present simplex 
ту. . + . 

WI Second and most highly important result reveals a limitation of 

considering variables as points only in a Euclidean Space. Regarded this 

way, our simplex appears n-dimensional, or with as many Euclidean dimen- 

Н H 
sions as distinct variables. However, when distances between these same 
points are measured in a certain non-Euclidean fashion, then the points can 


be plotted on a straight line, or they form a one-dimensional non-Euclidean 
system. 


Further novel features a 


ppear in our generalized simplex with respect 
to the psychological theories 


that can possibly account for it. 
П. General Notation 


core of person i on test j. The mean and 
est are arbitrary, and indeed are usually 
9n procedure (3). One part of the problem 
press each t;; as the sum of two types of components: 
r "unique"). Let ei; be the score of person $ on the 
est 7. Then we can write, for all i and 4 

li; = шг + Cii, 
non-deviant part of £; , and ш; is a multiplying 
Tor the arbitrariness of the standard deviation of the 
observed t,, . Especially in the simplex theory to follow, the standard devia- 
tions of the Si; are not in general arbitrary, 


Since the present simplex theory is concerned onl 
between the Si; , it will be convenient t 


common and deviant (о 
deviant component of t 


where s;; is the structural or 
constant to allow 


У with covariances 
9 consider the mean of each to be zero, 


Es;-0 s Ui ee on. (2) 
Various laws of deviation are possible for the ец 
The опе assumed in conven 


: › аз pointed out in (3). 
tional factor analysis is + 


he à-law, 
cov (e; , з,) = coy (6,6) = 0 (j я k). (3) 
A well-known consequence of (3) and (1) is that 
cov (t; , №) = ww, соу G58) (= 1). (4) 
According to (4) 


a) 
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for the main diagonal. Any submatrix in the one that involves no main 
diagonal element must have exactly the same rank as the corresponding 
submatrix in the other. This suggests one way of testing hypotheses about 
the s;; , insofar as these lead to conditions on the ranks of certain submatrices. 

The Јах (3) may or may not be true in practice for a given set of 
data. One approach to testing it for some data is by image analysis (6). 
We shall be concerned here primarily with structural laws or theories for the 
s;; , and the truth or falsity of the deviance law (3) is a subsequent problem 
to be explored ultimately with empirical data in any given case. 


III. Review of Previous Data and Theory 


Several correlation matrices published earlier in the literature by various 
writers have now been re-analyzed and found to form approximate simplexes. 
These data represent a wide variety of mental abilities and personality 
traits (1, 3, 4). Two examples are shown in Tables 1 and 2. One is of a battery 


TABLE 1 


Correlations Among Six Numerical Ability Tests” 


Arithmeti- 
Subtrac- Multipli- cal Numerical 
Test Addition tion cation Division Reasoning Judgment 

Addition 1.00 .62 .62 .54 .29 .28 
Subtraction .62 1.00 «67 53 38 37 
Multiplication +62 67 1,00 62 48 52 
Division E 53 .62 1.00 +62 57 
Arithmetical 

ИЕ .29 38 ELI .62 1.00 © 
Numerical 

Judgment .28 3T 52 57 .64 1.00 


і око 
*From Table 2, pp. 110-112 of (11). See analysis in (3). 


TABLE 2 


Correlations Among Six Tests of a Certain Type of Verbal Ability* 


س 


_ ——————————— 


Word Verbal Associ- 

Test Proverbs Vocabulary Checking Enumeration ation Synonyms 
Proverbs 1.00 55 .29 E 48 «17 
Vocabulary 55 1,00 46 44 м 24 
Word Checking 29 A6 1.00 56 34 22 

56 1.00 43 27 
Verbal 2А 44 " 
SESS eration » = = ied Е 
Association лв . : 
1,00 
Synonyms aAT .24 22 Ex 45 


г ТаЫе 1 
*Called “abstractness of verbalization" in (4, p. 13 ). Data from Appendix Table 


of (12): tests 43, 45, 58, 57, 6 and 55. 
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of numerical ability tests, and the other is of a certain type of verbal ability 
tests. 

From mere inspection of Tables 1 and 2, it is clear that there is some 
kind of order relationship within each battery of tests. In each case, the 
largest correlations are next to the main diagona, and taper off to the north- 
east and southwest corners of the table. No other arrangement of the rows 
and columns of the tables, or reshuffling of the order of the variables, will 


one could regard the variables 
› and the correlation of one variable 


along this line. 
One of the interesting new parametric pro 
the simplex theory of the present paper is tha. 
literally plotted as points along a straight line, 
being strictly additive. 
It has been shown in 


will yield a gradient among correlation coeffie 
characteristics of the em 


perties to be developed in 
& simplex variables can be 
with distances between them 


Tie denote the score of 


. It is convenient to assume also 
that the means of th lent to assum 


à е Tie are zero, Thus, the assumptions so far can be 
Written as 
Es, = 0 (c = 1,2, п) (5) 
апа 
Ета, = 0 وا وه کا‎ LR ME (6) 


i assume further that there is an order within the $; and also witl 
the v. such that for all $ and j the following factor law of formation hol 


2 Additive 
Su = Da, restricted |. 


cal 


hin 
ds: 


(7) 
simplex 
Let c,, and c. be the Standard deviati 

3 1 viations of в, and 2, , i d 
let p,,,, be the Coefficient, of correlation between s, and s я лем 
proved from (5), (6), and (7) that | m P 


32,5, 8 
T n) (8) 


(9) 
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According to (8), c., increases as k increases, so that for fixed j in (9) the 
right member must decrease as k departs from j. This describes a gradient 
in the correlation coefficients of the s; , which when modified in the t; by 
the presence of error as in (1)—say that (3) and (4) hold—can approximately 
give rise to observed gradients such as in Tables 1 and 2. 

Another way of writing law (7) is 


$5 = St + ur. (10) 


Equality (10) asserts that s; is the same as its predecessor s;-, , except for 
the addition of a new factor. Interpreting Table 1 this way would imply 
that—apart from deviant factors of the e; type—the subtraction test involves 
the same v, as does the addition test, but also an x, not called on by the 
addition test. The multiplication test calls on both x, and x+ , but also on 
an хз, ete. A corresponding explanation would hold for the hierarchy among 
the verbal ability tests of Table 2. 

It has been shown in (3) how an entirely different factor law can give 
rise to exactly the same type of correlation matrix as in (9). Instead of having 
factors xv, that are added according to (7), it is possible to write a law wherein 
factors are multiplied by each other and yet yield a hierarchy of correlations 
identical with (9). Even other laws may yield exactly the same results. 

But it has also been pointed out in (3) that the detection and use of 
the simplex pattern does not at all depend on knowing whether law (7) 
holds or some alternative law leading to identical results. It is sufficient 
to determine the law of formation of the correlation coefficients, say such as (9), 
and for this the specification of an underlying law of factors such as (7) is 
not strictly necessary. 

An important feature of a matrix with elements of the form (9) is that, 
if o,, 7 c., whenever j > k, then the matrix is nonsingular. Furthermore, 
the inverse of this nonsingular matrix has zero elements everywhere except 
in the main diagonal and in the immediately adjacent diagonals. This has 
profound implications for prediction problems, since the elements of the 
inverse matrix are the basis for the multiple regression weights for any linear 
multiple regression on the s; . This also has profound implications for the 
internal structure of the s; , for these vanishing elements of the inverse show 
that the principal components of the 3; satisfy a certain second-order linear 
difference equation, and hence must obey a certain general oscillatory law 

mation (2, 3). 

" oem "e to generalize law (9). We shall do this in two steps. The 
first stage is to use à generalization of law (7) for expository purposes. 


IV. A First Parametric Generalization of the Additive Simplex 


om (9) that only positive correlations can arise from the 


i Fh 0 
cc m e must be an order system which 


restricted hypothesis (7). But surely ther 
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would also allow for negative correlations. It is also verifiable from (9) 
that any tetrad, or second-order minor determinant, must vanish if all of 
its elements are on one side of the main diagonal of the correlation matrix 
(and not vanish if elements come from both sides of the main diagonal). 
Could there be an order system that does not lead to such a restrictive con- 
dition on the rank of parts of the matrix? 

A generalization of (7) that does relax the 
as follows. In (7), each т, operates as 
that s; does not involve z, whenever c > j. Fore ј, then, let us assume 
there is an alternative set of factors operating, say some y, . 

Let y; be the score of person ? on alterna 
assume the means of the у. are zero 


se restrictions somewhat is 
an "all or none" affair, in the sense 


tive factor y, . For convenience, 


Еу. = 0 (c = 1,2, ... , n). (11) 


Analogous to (6), we assume the у, to be uncorrelated with each other, 


E yay, = 0 (br e; b c = 1,2, ... ny, (12) 


to be uncorrelated with z, whenever b = с, 


Ez. =0 (bx ©). 


We also assume Ye 


(18) 
Let y. denote the covariance between t, and y, 
quem E zy; (@ = 1,2,..› ‚яу, (14) 


£ the size or sign of y, for any с 
5 can arise from different psychological 
Ye might be an 
Bative. Or x, and y. 
1 hibition) of the same 
might be positive. 


Sy Уа, i » ». ne FN (18) 


of additive simplex 
In place of (8) we now get 


: x i Я п A 
Tr; E X И. (16) 
Tt is also easy to derive from (11), 


cov (8; , s) = >: о. + 2 о. 3 yu E Cie (17) 
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From (17) and (16) 
k 
cov (81,8) = o, + 2, (v — of) (i SB), (18) 
с=+1 


во that (9) generalizes to 


pun = ula) +[ в.) | енә баю. a» 


Since the second terms on the right of (18) and of (19) can be negative— 
especially when y, < 0 for some or all of the c—the left members can also 
be negative upon occasion. Thus, law (15) allows also for possible negative 
correlations among the s;; . 

The rank condition on the correlation matrix resulting from (9) is also 
relaxed a bit, according to (19). To see this, it is easiest first to deal with 
the covariance matrix defined by (18). Taking first differences with respect 
to k, we see that 


cov (8; , 8.1) — COV (8; , 8) = Yrer = One (J < №). (20) 


In the matrix of order n X (п — 1) defined by the left member of (20), all 
submatrices with elements all on one side of the main diagonal are clearly 
of rank one at most, according to the right member of (20). Hence, in the 
n Х п matrix of the elements defined by (18), all corresponding submatrices 
cannot be of rank greater than two. But the rank of any submatrix in 
[p.;..,] is the same as of the corresponding submatrix in [cov (s; , s,)] since 
the rows and columns of one differ from those of the other only by constants 
of proportionality. Hence the rank of any submatrix of [p,;,.,] cannot be 
greater than two when all its elements are on one side of the main diagonal. 
Formula (19) will of course allow for a closer fit to data such as in Table 1 
and Table 2 than will formula (9). This may be needed especially to account 
for the aberration of the subtraction test from a simple gradient; apparently 
subtraction differs from addition and multiplication in somewhat of another 
manner than called for by law (7), and law (15) may be more appropriate. 


V. A Second Parametric Generalization 


A formulation like (15) is helpful in trying to understand what kinds 
of processes can possibly give rise to order relations among observed correla- 
tion coefficients. However, а formula like (19) can divert attention from the 
main consequences of having order relationships. One might be tempted 
to focus, for example, on the problem of estimating the 7. and the оу. to be 
used in (19). Clearly, an analysis based on observed correlation. coefficients 
alone can only hope at best to estimate the differences (у. — о».), and not 
each term separately. That is, a correlational analysis alone cannot hope 
to piece out all the details of a process such as (15). Even if this were possible, 
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there are many important things to be learned about [p,,,.,] that do not 
need specification of these details. | | 

We shall now give the main generalization of the simplex intended 
in this paper. It involves no explicit use of underlying factors x, , y, , or 
any others. Its focus is on what can be learned by a correlational analysis 
alone. . ; 

Each of laws (7) and (15)—given also the assumptions (6), (12), and 
(13)—satisfies the following necessary condition 


E (s; — says = $4) = 0 (Gjsks 1. (21) 


This is an order condition among the s; , and yet needs no detailed specifica- 
tion of an underlying factor mechanism. All that is hy 


— $: Whenever Sh sal 
is that we can regard 
but we can specify an 
d; be defined by 


dn = E (s; — sa)? @, B= 1,2 sas jn). (22) 
Now, we can write the identity 
$i — Sa = (s; — Sa) + (s, — Sin). (23) 
Taking expectations of the Squares of both sides of (23) shows that the 
following theorem is true. 
THEOREM 1, 


A necessary and Sufficient condition for the order relation 
(21) to hold is that 


di = d, + d; (7 SRS 7, (24) 


iz - This makes the dimen- 
: cae е rank of the matrix [525,23]: 
Now thig йн са : 1) holds, or n Euclidean 
dimensions are required. Using the non-Euclidean metrie of (22) leads to 
but a one-dimensional Space, according to Theorem 1 
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It should be remarked that the distance function (22) does not yield 
a metric space in the general case of arbitrary variables, for the requisite 
triangular inequality need not be satisfied. However, we are using it here 
only for the special case where (21) holds, so the space of the specific points 
involved is certainly metric, being even one-dimensional in the sense of (24). 

The writer first used a metric of the type (22) in the context of the princi- 
pal components of scale analysis of qualitative data (8), and this suggested 
the developments presented here for a simplex of quantitative variables. 


VI. The Rank of Certain Submatrices 


From now on we shall be concerned largely with the covariances among 
the s; , so it will be convenient to let су. denote the covariance between 
з; and s, 


oj, = cov (S; ,$) = E зв (Е = 1,2, +++ п. (25) 


We wish to prove the following theorem: 


THEOREM 2. If n variables s; satisfy the order condition (21), then any 
submatrix of [c;«] cannot be of rank greater than 2 if all its elements are on one 
side of, or on, the main diagonal. 


For the proof, we first expand (21), using notation (25), to obtain 
Oik = оў + Or — бы (7 Sk < 1). (26) 
Since [c;,] is a symmetric matrix, it suffices to consider only submatrices on 


one side of the main diagonal, say with all elements to the right of (or above) 
the diagonal. By differencing (26) with respect to 7 we see that 


Ciri — бук = бузул — Cit urista: (27) 


According to (27), all elements to the right of the main diagonal and in the 
same row of the n X (n — 1) matrix [суза — туь] are equal. Hence no sub- 
matrix which is all to one side of the main diagonal can have a rank exceeding 
unity. Consequently, the corresponding submatrices in [sc;,] cannot have 
ranks greater than 2, or Theorem 2 is proved. 


VII. The Problem of Weights for Principal Components 


Related to Theorem 2, but perhaps more striking, are two laws of forma- 
tion: one for the inverse matrix and one for the principal components of 
М 21) holds. 
lei] ا‎ па these laws, we first wish to take into account the fact 
that the principal components of a covariance matrix depend in part on the 
weight functions used, or the relative sizes of the standard deviations of 
the variables concerned. The components may shift also as one removes 
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ple from an infinite universe of variables. и 
each 1, has a different €, , many can have exactly the same $; , or be aime 
at exactly the same aspect of the underlying simplex. Let f; be the relative 


frequency of s; in this sense; that is, f; is the Proportion of all the £, which 
have the same s; in (1). Then 


> j=, (28) 


Next, we shall allow for the Possibility that it is not the с 
be analyzed, but perhaps the Рек OF some о 
Ti» . Let v; be the weight associated with s; 


iz themselves to 
ther weighted function of the 
- Thus, if the principal com- 
= l/e; . If the principal compo- 
i = w; . In general, 
› and we wish to know the principal 
components of the Gramia 


п matrix [vjv,s;,] when relative frequency f; 
and column j. 


Let à denote a latent root of the matrix. 


› and let z; be the jth element 
of the associated latent vector, Our job is to 


solve the Stationary equations 
(cf. 3) 
У vino, = №, (k = i5 2, єт» ай, (29) 
isi 
To simplify notation for the solution, let 
Ui = Zift; , a; = fal (j = 1,2, ... ›1). (30) 
Then (29) can be rewritten аз 
D ao = м, (k = 1,2, --- n). (31) 
i=l 


It should be remarked that, fro 
even though the v; May be negati 
assuming all the a; to be Positive, 


а;>0 (j = 1,2, 


for if a, = 0, this would be equi 
use in (31), 


m (30), the а; are 


always non-negative, 
ve. There is no los 


S of Benerality, then, 


we n), (32) 


valent to f i = 0, or no s, to begin with to 


ee 
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basic importance in its own right. It provides the regression coefficients in 
multiple correlation problems involving all the s; , and it provides the partial 
correlation and multiple correlation coefficients involved. In short, it is a 
basie tool of image analysis (6). 

Eventually one would like to know about the inverse of the observed 
correlation matrix [p;;;,]. Since this will depend partly on the deviance law 
of the e; in (1), all we shall do in the present paper is analyze the case where 
there is no error; we shall concentrate only on [;,]. But even so, it is important 
to allow for the frequency function f; , and to be concerned ultimately with 
the infinite universe of variables and not just a finite observed sample there- 
from (6, 7). 

If [с;,] is nonsingular, let с” denote the typical element of the inverse 
matrix. The inverse must be symmetric, since the covariances are. Thus, 


с!“ = с“! (j,k = 1,2,..-‚т). (38) 
И 6;, denotes Kronecker's delta, then 


Deno’ = às (5l 1,2, +--+, n). (34) 
j=l 


We shall solve (34) for o” by a differencing process. 
The following differencing notation will be used. If Zk , Zik , ОГ Zr are 
any quantities to be differenced with respect to k, then 


Аж = 2+1 — 26, Â Zik = Zisi — Zin, Агы = 20414 — Zu. (35) 
k k 


If we let | = k + 1 in (26), the equations can be rewritten as 


ies = Oe l(3sk 
Aisne me ac 1 ок (j s k) (k = 1,2, ... =i). (36) 
* Okei — Ok,k+1 (7 5 k) 


Differencing both members of (34) with respect to k and using (36) yield 


k 
(тк ksi ‘= о) »» с" te (скал — оь) 23 ot = А bx: 
i- k 


i=k+1 
E x P э a ) (37) 
1=1,2,+++,n 
Let a, be the sum of the elements in the lth column (row) of [c], 


ae o ТРН" (38) 


2=1 


Also, notice from (22) that 
Ayan = 0k — оњ tote — (b— 1,2) n = D- (39) 
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By bringing in the notion of a frequency function f; we are in effect assuming 
our 7 points s; to be distinct, or that 


diia > 0 (k = 1,2, --- ,n — 1). (40) 
Let b, and c, be defined respectively as 
b, = (ов — exa) / diss Ce = 1/0, (k = 1,2,...,ъ—1). (41) 
Then by using (38), (39) and (41), we obtain from (37) that 


-- bed» eoe 
L o = ob, — А би C ъз; н ) (42) 
e s ES 525 m 

"Taking first differences in (42 
order difference equation 


С = Di «м» = 
= a å bp — AG A б.) ( 55 iia ? (43) 
* Ай. b= 1,2, +. a 


) with respect to k yields the important second- 


kl. 
т 


We now wish to obtain 


an explicit formula for o, in (43). 
known for Kronecker delta’s, 


As is well 


1 
= 


Увы = к ТАЙ ааб 


F (44) 


Hence, if we let а be the sum of all n* elements of с", ог 


а = 5а = »» үз а", (45) 
and if we sum both members of (42) over 1, we obtain 


k 
24% = ab, (Е 5. saw 


Since [o^] must be Gramian if i 
to be a quadratic form ove 
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and finally, for k = n — 1, (46) shows that a — а, = abs. , or 


o, = а(1 — 6,1). (50) 
Therefore, if we let 
b, (k = 1) 
0 = A быз (k = 2,8, +++, — 1) (51) 
1 — Bea (К=з) E 


we can write our desired formula compactly as 
a, = ag. (k = 1,2, === ,m). (52) 
It also follows from (51) that 
25 9 = 1, (53) 
kel ^ 


so (52) cannot be used to obtain an explicit formula for a. 

An explicit formula for a is easily obtained as follows. Multiply both 
members of (38) by c,, and sum over J. Recalling (34), (44), and (52), we 
see that—changing subscripts— 


a 27 gion =1 (k = 1,2, +++ ,n), (54) 


or 
a= V (x te.) (k —1,2,---,m. (55) 

i=l 
Using notation (51), and shifting notation from k + 1 to k, we can now 


rewrite (43) as ý 
B=, met | (56) 
, 2, EA e 


©, 


p= 
i= 


N 


agg. — A (c.i A бь-1,1) (‏ = ای 
k k‏ 


= 


Now, (56) gives all the elements of [с] directly except for the first and 
last rows (k = 1 and k = n). These “boundary conditions” are obtained 
from (42). Setting К = 1 and using notation (50) show that 


c" = oggi — Cı А б (121,2, +++ ,m). (57) 
1 


Setting k = n — 1 in (42), and using (38) and (51), show that 
c" = agg: + 6a A Bua (b= 1, 2,22: , a) (58) 
п-1 
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IX. The Inverse Matrix and the Ranks of Its Parts 


To see more graphically what the inverse matrix defined by (56), (57), 
and (58) looks like, let cz, be defined, for all J, as 


—t; A буу (k = 1) 
1 
Cr = SA (ea A 8,11) (k = 2,8, +++ ,n 1) (59) 
k k 
Quer А eng (k =n). 
n=l 


The right member of (59) expands into thi 


е following explicit statement of 
the elements of [с]: 


Cı _& 
—G с | е _@% 

[бы] = 76 в +в (60) 

mL "бы 

—Cn-i Cn-1 

[c^] can now be regarded as the sum of two matrices, for we can write 
в" = а: + сы (k, l = 1,2, ... m. (61) 
Now, from (59) and (44)—or from (60)— 

Dien = 0 (= 1,2, ... п), (62) 


or the columns (rows) of [с] are linearly de 
It has been proved i 


(60) must all be distinct. 


are either of rank zero or one. "gam 
On the other hand, the matrix [09,01 

product of a vector and i 

[о] cannot vanish, else t 

the singular [с]. 


assured by (53), 

From the conclusions 
Theorem 2 holds for [7*] as well as for [c;,]. (Indeed rri n- 
published theorem that sho ыен и 


= 
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parts of an inverse and the corresponding parts of the original matrix for 
any nonsingular matrix. We have merely worked out a special case here.) 


X. Implications for Statistical Prediction 


While the one-sided rank 2 condition holds for [c^], it is the details that 
are important. In general, the elements of an inverse matrix depend on all 
the elements of the original matrix, and will change as the order of the matrix 
is increased or decreased. But in (61), the only over-all factor that changes as 
variables are added to or removed from the battery is œ, or the sum of all the 
elements in the inverse. 

A coefficient д, , as defined by (51) and (41), depends only on the vari- 
ances and covariances of s, and its immediate neighbors s,., and з. A 
coefficient с. will vanish, according to (60), unless | = k — 1, К, or k + 1. 
Therefore, if an s,,; is added beyond the s, of the given simplex, none of 
these coefficients will change except those for s, . Or if a point is inserted 
between s, and s,,, in the simplex order, this will change coefficients associated 
only with points in the neighborhood of this new point. Thus again, as dis- 
cussed in great detail in (3), in the multiple linear regression of any s, on 
the remaining n — 1 distinct variables of the simplex, the multiple regression 
weights and multiple correlation coefficients depend essentially on the law 
of neighboring of the points of the simplex. 

Again, the possibility appears that s, can be essentially as predictable 
from s,., and s,,, as it is from all the n — 1 distinct variables in the simplex 
apart from itself. We shall now see how, under certain circumstances, c"' is 
determined largely by cx, and hardly by agg, . 

Specifically, we shall prove the following theorem: 


THEOREM 3. If Xo їз the smallest latent root of [o;,], then 
as (s >, 0). (63) 


If Ж $5 give as No, then а — 0. 

j=l 
For the proof, multiply both members of (54) by g} , sum over k and use 
(53) to see that 


n 


a Б} >; Jij = 1. (64) 
=1 


i=l 
Now, the value № is the smallest obtainable by the quadratic form on the 
left of (64) when the 9; are normalized, or 
У У gio 2 № >; ji - (65) 
jel kel i= 


Hence, (63) follows from (64) and (65), and the theorem is established. 
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In circumstances where the g; are approximately equal among Ga 
quantities of the form вне g; will be of the order of 1/n. Then, : a 
smallest root №, does not tend to zero with n, or if it tends to zero ata s 5 
rate than the order of 1/n, it follows from Theorem 3 that [@9;9.] — 0 a 

m (61), [o*'] — [e,,]. 

к EO Д 0 же, be a special case of an e-simplex as defined 
in (3). The general definition of an esimplex is essentially €— 
for finite n, in the sense that it is concerned only with limits as n — œ. 
simply states that multiple regression coefficients should tend to zero for 
non-neighboring tests, or elements more than one diagonal away from the 
main diagonal of the inverse matrix should tend to zero as п increases. The 
simplex defined by law (21) can, therefore, be a special kind of esimplex. 


XI. The Difference Equations for Principal Components 

Having an explicit formula such as 
us also to study the principal component 
members of (31) by o* and sum over /; 


(61) for the inverse matrix helps 
$ defined by (31). Multiply both 
to obtain—revising subscripts— 


ай. = Уш (k= 1,2,... , m), (66) 
im] 
Let 8 be defined by 


B= > giu, . (67) 
km 
Then using (61) and (67) in (66) shows that 


аль = ав. + È uen) 
i=l 


Now, the su 
order differences 


(= 1,2, «+. ч, (68) 
mmation on the rj 


ght is also expressible аз 
among the с, 


first- and second- 
- For, using (59), 


we see that 


—в, Au, (® = 1) 
È uen = -4 (© 4ш.) (k = 2, 3, n — 1) (69) 
Cii ‚А 1-4 (k =n), 
Thus, (68) ean be regarded a, 


: 3 5 expressing a second-order diffar equation 
with two first-order boundary conditi diferente en 


Strictly speaking, howey 
is involved in (68), for в depe 


-order difference equation 
if 8 = 0, then (68) certain 


i , according to (67). However, 
ight order. If В = 0, we can 
nd regard the unknown to be 
ined only up to a constant of 


L. GUTTMAN 189 


proportionality in any event, this is one way of taking up this degree of 
freedom. 

The properties of the solutions to (68) in the general case remain to be 
explored. The previous special case of the restricted simplex in (3) is where 
f: = land 9; = 0 for j = 2,3, --- ‚п. Then В in (67) is simply 8 = giu, 
and (68) is 


Gg = X Y (ба + 81; nagi) (k = 1,2, --- п. (70) 
j=l 


The matrix implied by the parentheses on the right differs from [с] in (60) 
merely by adding the quantity ag? to су , or the element c, in the first row 
and column. This merely changes the first boundary condition, as obtainable 
from (69), but leaves the rest of (69) unchanged. 
The solutions to (70) have the law of oscillation discussed in (2) and (3). 
Another special case of interest is where g; is constant for all j. From 
(53), this implies 


9; = l/n (j = 1,2, +++ n). (71) 
If in addition, weights are chosen so that a; is constant for all j, say 
a =a (j=1,2,-+-,n), (72) 


then it is easily seen from (68), (67) and (62) that [g;] is a latent vector 
with latent root А = na/a. Since all other latent vectors must be orthogonal 
to this one, it follows from (67) that В = 0 for the remaining latent vectors, 
and we are back to our standard type of difference equation for these remaining 
vectors; they are the vectors of [с]. Hypotheses (71) and (72) lead to the 
case where the centroid is the same as a latent vector. 

This raises the following question. If a resolution into components 
is desired, why not work in any case with those indicated by the formula 
for the inverse matrix? Certainly, basic structure properties are revealed 
by (61). If we again assume (72), then the first centroid loadings of (61) 
are V'ag(k = 1, 2, +++ , n). According to the general formulas of (9) and 
(10), any Gramian matrix can have its rank reduced by extracting a cen- 
troid—the process is not restricted to correlation matrices and can be used 
on [*'] in particular. If we subtract out the contributions of these loadings 
from [s"'], then we are left with the matrix of rank n — 1, [си], which now 
has an interesting law of principal components. 

The factoring law suggested by this, then, is first to remove the first 
centroid, and then resolve the rest into principal components. | 

Since we are not factoring [c;;] here but its inverse, we are not factoring 
the observed scores. Rather, by implication we are factoring the anti-image 
Scores, for [c^] is closely related to the covariances among the anti-images 
of the s; (6). That factoring a Gramian matrix is equivalent to factoring a 
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score matrix of which it is the product has been proved in (9) and discussed 
also in (10). 


If (72) does not hold, then a more general weighted average is called 
for than the centroid to remove the term in agıgı from (61). 


ХП. The Sufficiency of the Formula for the Inverse Matrix. 


Up until now, we have not mentioned а somewhat important question. 


Under what conditions does (61) provide a matrix that is actually inverse 
to [r5]? We have arrived at (61) by assuming [с] to be nonsingular and 
(40) to hold. But can [са] be nonsingular if law (21) holds? Fully to establish 
(61), we must prove that (34) actually holds assuming that [c;;] obeys law 
(21), or (36). 

A first indispensable assumption clearly is that (40) holds, for if two 


points coincide, [c;;] must obviously be singular. Next, let us examine the 
assertion in (55) that the sums of the rows of [c;,], when weighted by the g; › 
are constant. Let h; be defined as 


h, = > Vien (k = 1,9... ,n). (73) 


Differencing both members of (73) with respect to k and using (36) yield 


k 
Ah = (Gener = сі) уз 9; 
inl 


ШОШ d Еа. | 
i=k+1 
From (51) and (53), 


k n 
Digs = be, 
= 


i=k+1 


Gil bi Woe 12, E DD. (75) 
Multiply both members of (74) Ь i 

ое (74) by c, , use (75) and notation (41)—remember- 
6 Ah, = (0, — N+ t — №) =0 (k= 1,2,«*,m— 1). (76) 
Therefore AJ, = 0 for 


all possible № 
Should the constan си 


t value of h; 
ing linear dep 


ie | me the constant value of h, to be different 
5 constant value by 1/о, or define a b 
м у (55). 
© can now go ahead to define a matrix lo] by (61), and proceed to 
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prove that it satisfies (34). Multiply both members of (61) by c;; , sum over 15 
and use (55) to see that 


У` сьс“ = gı + У) саса (,1= 1,2, =, т). (77) 
kel k=) 


In (59), since c; = сь, interchange К and J; multiply through by c;; , and 
sum over k to obtain 


—e А сд f= 0 
2; тибы = 4 —A (e А Sit) Q= 2,8, === „ий — 1) (78) 
kel 1 1 

Cni А буи (Ln). 


n-1 


Using (36) in (78), remembering notation (41) and (51), shows that 


2 бы = би — gi Gl = 1,2, +++ ,n). (79) 


Substituting (79) into (77) shows that (34) holds, which is what was to be 
proved. These results can be summarized as a theorem. 


THEOREM 4. If [ту] satisfies law (21), then a necessary and sufficient 
condition for it to be nonsingular is that di... > 0 (k = 1,2, --- , n — 1) 
and that ppm gia; be different from zero for at least one value of К. Then 
[oj] is given by formula (61). 
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EQUATING TEST SCORES—A MAXIMUM LIKELIHOOD SOLUTION 


FREDERIC M. Lorp 
EDUCATIONAL TESTING SERVICE 


Certain problems of equating are discussed. The maximum likelihood 
solution is presented for the following special equating problem: Two tests, 
О and V, are to be equated, making use of a third "anchor" test, W. The 
examinees are divided into two random halves. Tests U and W are adminis- 
tered to one half; tests V and JV are administered to the other half. It is as- 
sumed that any practice effect or other effect, exerted by U and V on W, is 
the same for Ü and for V. 


Two tests may be said to be equated for a given group when the score 
Scales on the two tests are so adjusted that both tests have the same fre- 
quency distribution of true scores in the given group. [Flanagan (1) and 
Gulliksen (2, pp. 296-304) give brief discussions of various methods of 
equating.] If the tests are equally reliable, then both tests will also have 
approximately the same frequency distribution of actual scores. As an approxi- 
mation, two equally reliable tests may be equated by changing the score 
scale on either test in such a way that the distribution of actual scores be- 
comes the same for both tests. The equipereentile method of equating is 
commonly used for this purpose. 

If we wish to equate two equally reliable and otherwise approximately 
parallel forms of the same test, it is often convenient to assume that the score 
distributions of the two forms may differ somewhat in mean and variance, 
but that any other differences in the shape of these distributions may be 
ignored in practice. Under this assumption, the tests can be equated by simply 
changing the origin and the size of the unit of measurement of either score 
scale. If x and y are scores on two tests, the standardized scores (a — u,)/o- 
and (y — u,)/c, (и and о denote mean and standard deviation in the popula- 
tion of examinees for which the tests are to be equated) both have zero mean 
and unit variance; consequently, under the assumptión outlined, standardized 
scores are equated, by definition. 

Under the assumption of the foregoing paragraph, which will be implicit 
in all that follows, the only practical problem is to estimate м, , p, , о; , 
and c, for the population in which the two tests are to be used, so that the 
scores on both tests can be standardized. An obvious procedure is to administer 
test X to one random sample from this population and test Y to another 
random sample, and to estimate the desired parameters from the usual 
sample statistics. 
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The procedure just outlined, however, is not very efficient, since ipe 
fluctuations produce differences in ability between the two groups, hs 
these differences cause a bias in the equating. À more efficient met - А 
provided the practice effect is properly handled, is to administer both tes 5 
to each examinee. Unfortunately, it is frequently not possible in goon 
to obtain sufficient testing time to administer two full-length tests to eac ^ 
examinee. A compromise procedure, suggested by Ledyard Tucker anc 
commonly used at Educational Testing Service, is 


to divide the examinees 
into two random samples, each of which t 


akes only one form of the test to 


and the resulting equated 


equated scores obtained by the methods 
discussed here and by certain other methods are given in (4).] 


It might be thought that the best procedure w 
of the two forms to tl 


ould be to equate each 
siderably larger sampl 
test. An optimum eq 
is found by using the 


3 
ing errors than those obtained by ignoring the anchor 


andling the data in question 
ethod of estimation. The neces- 


used in Tucker’s procedure 
the assumptions made i 


Problem 


Two tests, U and V ; аге to be equated, making use of a third anchor 
test, W. The examinees are divided into two random halves, which will be 


called the “a-group” and the Tests U and W are administered 


to the a-group; group. It is assumed 
that any practice effe 


for U and for V xerted by U and V on W is the same 
or U and for V. 


communication) have shown that 
hen test W is a part of U and of 
V have common items W, 
Notation 

Consideration will be limited 
in each half-group. Let и, and ш, 
the a-group, on tests U an 
b, who is in the b-group. 


The symbols и, т, and p will be used to r 
tions, and correlation 


to the case w 
denote the вс 
d W;let v, and w, simil 


here there are № examinees 
ores of examinee a, who is in 
arly denote scores of examinee 
р epresent means, standard devia- 
Coefficients, respectively, in the population, The 


F. M. LORD 195 


population referred to here and in what follows is the population of all 
examinees from which the a- and b-group may be considered to be random 
samples. 

Sample means will be denoted by 7, в, i»; sample standard deviations and 
correlations by s and r with appropriate subscripts. Where the meaning 
would otherwise be unclear, a single prime or a double prime will denote a 
statistic from the a-group or from the b-group, respectively. 


Assumption 


It is assumed that the scores и, v, and w have a normal trivariate distri- 
bution in the population. The joint distribution of u, and w, is thus 


1 . 1 „Ише ный" 
fits , We) Saure VL cs A en | 20 — 5 ae 
+ (ated _ 9p, бев) We = | o 


Cw би Cw 


p, being the correlation between и and w. The joint distribution of v, and 


шь , denoted by /,(v, , ws), is the same as the foregoing except that и is replaced 
by v and a by b.. 


The Likelihood Function 


The likelihood of occurrence of the actually observed values of и. and 
w, in the a-group is, by definition, WIS. f. , Wa). Similarly, the likelihood 
for the b-group is TEA Јь(0ь , шь). The product of these two is the likelihood 
function (L) for all observed values in the data at hand. It will be convenient 
to work with the logarithm of the likelihood function, which is readily found 
to be 


log L = —2N log 2r — М log cus, 
= 2N loge, = ЗМ log (1 — pòl — ру) 


1 ug compe Nay Ls aos р 
- [+ E tu Hu, + сз È (и, Hw) 


ou 


— 2% у) (и, — niu, Б] ¬ Xi 1 Ej [5 27 (9i =)? 


Fio “a э» b 
"^ 
+2 D - wot = 2% Elo = wen = д]. © 


The likelihood function contains eight unknown population parameters: 
Bes sists Леби y бэ э Ree Sis Dies We wish to choose values of these parameters 
that will maximize the likelihood of occurrence of the actually observed 
sample. Consequently, we differentiate (2) with respect to each parameter 
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in turn and set each derivative equal to zero, at the same time placing a 

circumflex above the symbol for each parameter to indicate that we are 

now dealing with estimates of the parameters rather than with their true 

values. Eight simultaneous equations in eight unknowns are thus obtained. 
The Likelihood Equations 


After some cancellation and rearrangement, the first three equations are 


Be — Bude =й — блр, (3) 

Be — Bree = 5 — Bas", (4) 

Bue = Boul n = ый, Ld = Дый 42 = Bad (5) 
Ky Ky Ky Ky 


where & = 1 — 62 ot = = р: , and each Bisa regression coefficient—for 
example, В,, = ububu . . Е 
Multiplying (3) by 8,,/ 2 ‚ multiplying (4) by 8... 


/ & , and adding both 
products to (5), we obtain, after simplification, 


A. = 0, (6) 
where ® = iw + 15") is the Observed mean 0 
b-group. Equation 6 presents the 


Substituting (6) in (3) and in 


f w in the combined a- and 
maximum likelihood estimate of u, . 


(4), we obtain, after simplification, 
A. = ū — B,D, (7) 
û. = ö + B.D, (8) 


nd for Ê, and Bs 


(9) 
a similar equation for v instead of и, 


i | 1 a x 
a— Bl СИ ~ 28.2.۵ | + ==, (11) 
and a fifth equation like (11) but with v instead of y, In 


the foregoing equations, 
12 

Se= E (ua — дуу, (12) 
С = >; (и, — й.) (и, = Aw)/N, 


(13) 
and so forth, 


— „ааль. 
—— E S 
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Multiply (9) by 2,/9: and subtract from (11) to obtain the result 
Bu (85 ёс.) Сы _ 0. 


fu \ Ow бы 6,6, 


(4) 


Multiply (14) by ff, write out £; and д, in terms of р, , and simplify to obtain 


Sesapa = Cu, . (15) 
This may be rewritten 
B= UE. (16) 
By a well-known formula, (12) can be rewritten 
8: = = + @— 2,)*, (17) 


where s; = У), (и, — à)*/N is the observed standard deviation of и. From 
(17) and (7), 


Si = si + ÊD’. (18) 

Similarly, 
8% gs + D, (19) 
Cun = Cue + В.П”, (20) 


and so forth, where cue = У, (и, — u)(w, — %’)/N is the observed covariance 
of w and w in the a-group. 


Substituting (19) and (20) into (16), we find after simplification 
Ё. = 6.2. (21) 


The expression on the right is the observed regression coefficient of u on w 
in the a-group, so we may write finally 


B, = Biss (22) 
From (9), (18), (20), and (22) 
бы = (6: — bus) = (1—1) = si, (23) 


where ¢2.. = uka, and si-w» is the observed standard error of estimate in 


the a-group. | и | 
Finally, substitute (16) into (10) and simplify to obtain 


6» = 89 + 5°) =з, (24) 
where s; is the observed variance of w in the combined a- and b-group, i.e., 
s = (wit 22w)/2N] — w^. (25) 

a b 


The writer is indebted to William H. Angoff for this simplified proof of (24). 
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The Maximum Likelihood Estimates 


The set of eight equations, (3), (6), (22), (23), (24), and three equations 
in v analogous to those in u, is sufficient for the practical calculation of 
the maximum likelihood estimates of all the unknown parameters. A more 
convenient set of eight equations, readily derived from these, is 


he = 0, (26) 
Bu = U + bil — 10), (27) 
û, = 0" + OG — ap), (28) 
а, (29) 
di = 4... + Bids = sl? а — si, (30) 
б = s + WINS: — sh, (31) 
uw = 4.6.р. = B62 = blus? , (32) 
Gro = бвр, (33) 


Population covariances. In the fore- 
primes have been attached for the 


cept 10 and s, , these last two values 
- and b-group. 


Equating 
Granting the assumptions 
n ] А 5 
и р ade from the Start, a good equation for 


@ — .)/о, = (u — и.) / с , 


(34) 
or, after rearranging, 
v= Au 4- В, 5 
where x 
А = afe, , (36) 
Ви Ар. (37) 


In (36) and (37), A and B 
c are expressed in terms of t tion 
өзүү hs are unknown, We wish to use meii. thane es 
в of A and B in (35). Since the maximum likelihood estimate of a certain 
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function of the parameters is the same as that function of the maximum 
likelihood estimates of the parameters, the equation to use for equating is 


v= Âu + B, (38) 


where A = 6./6. ; В =р„— Ар„ , the values of д, , û, , 6, , and ê, being 
computed from the data by means of equations 27, 28, 30, and 31. 

The formulas for equating thus obtained by the maximum likelihood 
method differ from those of Tucker, as discussed by Gulliksen, chiefly as 
a result of the fact that Tucker’s procedure calls for estimating the perform- 
ance of the b-group on test U, whereas the present procedure calls for esti- 
mating the performance of the entire population on both tests U and V. 
The present development is based on the assumption that the two groups 
tested are random samples from the same population. The assumptions made 
in Tucker’s development do not require this, but they do impose considerable 
restriction on the nature of the differences between the two groups. 


Numerical Example 


The following illustrative example is based on real data taken from 
Karon’s empirical study of equating methods (3). The raw data are given 
in the top half of Table 1; the necessary maximum likelihood estimates, 


TABLE 1 
Raw Data and Maximum Likelihood" Estimates Needed for Equating 


Combined 

Group a Group b groups 

Test U Test W Test V Test W Test W 

Mean (i, î, 0) 117.85 34.36 115.33 383.42 33.89 

Variance (s?) 1129.62 116.81 1109.65 114.89 116.07 

Regression on w 2.6744 2.6479 

д 116.59 116.58 
e 1124.34 1117.92 


computed by equations (27), (28), (30), and (31), are given in the bottom 
half. Each group contains a random sample of 600 examinees. The final 


equation, obtained from (38), 
v = .997u + 0.32, (39) 


gives the raw score (v) on test V that is equivalent to any given score (и) 
on test U. 
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If test W had not been administered, the final equation would have been 
v = (s./s)u + 0 — (s./s)ü 
= .901u — 1.47. (40) 


The use of test W provides the information that the b-group is probably 
slightly less competent and slightly less variable than the a-group (these 
differenees having arisen solely because of sampling fluctuations). The 
maximum likelihood estimates in Table 1 and the resulting equation 39 take 
this sampling fluctuation into account, whereas equation 40 does not. 
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AXIOMS OF A THEORY OF DISCRIMINATION LEARNI 


Frank RESTLE 
STANFORD UNIVERSITY 


Analysis of an empirical theory into a formal system with specified 
primitive notions and axioms has the advantage of making it clear what 
deductions from the theory are permissible, and clarifying the internal 
structure of the theory. An example of such analy is presented in this 
paper. 


Learning theories recently published by Estes and his associates (3, 4, 5) 
and Bush and Mosteller (1) have been characterized by mathematical 
formulation and reasoning. The writer has offered a similar theory designed 
for the analysis of two-choice discrimination learning. This new theory, 
using a strong simplifying assumption, yields several empirical predictions 
which have, in the main, been verified (9). 

According to this theory the subject is faced on each trial with a collec- 
tion of cues; some are relevant to getting reward and others irrelevant. 
On each trial of training some relevant cues are newly conditioned to the 
correct response and some irrelevant cues are newly adapted. A conditioned 
cue contributes to a correct response. An adapted cue becomes non-functional 
and does not directly affect the choice reaction. 

The probability that a relevant cue will be conditioned on any trial 
(given that it has not been conditioned on a previous trial) will be denoted 
by 6. Since 0 is constant from trial to trial and the same for all cues, the 
learning functions here are the same as the conditioning functions in the 
work of Estes and his associates (3, 5, and the "equal-0 approximation" 
case in 4.) 

The fundamental assumption of the theory deals with 0. This assumption 
is that 0 is the relative weight of relevant cues in the problem. The more 
relevant cues there are in the problem, the greater is the probability that 
any given relevant cue will be conditioned and that any given irrelevant 
cue will be adapted. By this simplifying assumption it is possible to make 
the theory unusually determinate. 

In the earlier paper on this theory (9) a number of quantitative empirical 

"тыз puper is adapted from erts а PD. die euet a To срт 
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laws were developed which were tested against experimental data. In general, 
ws were verified. 

ЧІ ү за ада paper a more precise and complete statement of the сац 

is made. Using only terms definable within the language of set-theory не 

logic, a complete list of primitive notions is given and the axioms are state : 

Deductions are carried out entirely by the methods of formal mathematics, 

without recourse to psychological intuition or "good sense." 

Before presenting the system it may be useful to' describe the mathe- 
matical notions to be used. A binary relation is a relation between two entities. 
By a set is meant any arbitrary collection of things. In the formula [Gy = % 
the term z is the value of the binary function f. If z is a real number, we 
say that f is real-valued. An ordered couple is a set which has two members, 
with the restriction that specifying the set requires not only naming the 
members but also indicating what order they come in. If (x, y) is an ordered 
couple and x >= y, then (x, y) # (y, 2). 

The usual set-theory notation is used; if X and Y 
includes everything which is in either X or Y, 
which are in both X and Y, and 
are in X and are not in Y. 
of the paper, capital letters 
used; lower-case letters de 
function is given the design 


are sets, X U Y 
X A Y includes the elements 
X — Y includes all the elements which 
The empty or null set is called A. In the body 
are used to denote the sets and the one relation 
signate functions, integers, and variables. One 
ation 0 to follow earlier usage (4, 5). 


Primitive Notions 


This system of discrimina: ed on seven primitive 
t А к $ 
notions, K, S*, Q, w, c, a, and p. K is a set, S* is a set of ordered couples, 
Q is a binary relation, w is a unary real-valued function, and c, a, and p 
are binary real-valued functions. 


The set K is intended to be interpreted as the collection of cues. A cue 


is anything, concrete or abstract, present, past or future, of any description, 
to which the subject can learn to 


а { make a differential response. Obviously, 
at any given time there are cues to which the subject does not make responses 
— otherwise, there would be no learning. But if the subject can learn a diff- 
erential response to Something, by some training method, then that thing 
1s a cue. Some cues are relatively simple energy sources. Somé subjects can 
learn to respond to spatial or temporal patterns of Objects or events; some 
produce reactions, Overt, perceptual, or "thinking," Which they can dis- 


criminate. Accounts of mediating La 
processes in work by © 
m laf (i0) can be found in work b м 


* iai В 
н Тһе set S* is intended to be interpreted аз any collection of two-choice 
di scrimination problems, all of which involve the same pair of choice reactions. 
A problem $ is uniquely associated With a pair of sets of cues: the set of 


tion learning is bas 
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relevant cues, R, which the subject can use to predict reward, and the set 
of irrelevant cues, Г, which are uncorrelated with reward and therefore 
cannot be used to predict reward. 

If S is a problem in S* and n is a positive integer, then SQn is interpreted 
as the statement, “problem S appears on trial n." This is true if the subject 
must make a choice reaction in problem S on the nth trial. 

If k is a cue, w(k) is interpreted as the weight of cue k. According to 
Axiom D2, w is a discrete probability distribution defined over the class K 
of cues. 

If k is a cue and n is a positive integer, then c(l, n) is the probability 
that k is conditioned to the correct response at the beginning c? the nth trial. 
If k is a cue then a(k, п) is the probability that k is adapted at the beginning 
of the nth trial. 

Before stating the axioms of this system we define 6(S) as the relative 
weight of relevant cues in problem S. This term will appear later in the 
learning functions of Axiom D7. 


Definition: If S = (В, I) is in S*, then 0(5) = Dyer w(k)/ 9 eun w(k). 


Axioms 

Definition: A system (К, S*, Q, w, c, a, p) satisfying Axioms D1— DS is called a system 
of simple discrimination learning. 

Axiom D1. К and S* are non-empty, at most denumerable sets. 

Axiom D2. If k is in K, w(k) > 0, and У rex w(k) = 1. 

Axiom D3. If S = (R, I) is in S*, then В and J are subsets of К. 

Axiom D4. If S = (R, I) is in S*, then the intersection of R and J is empty. 

Axiom D5. If В, and 8 are distinct members of Ss ї 


if n is a positive integer, and 
if SiQn, then not S.Qn. 

Axiom Рб. If S = (R, I) is in S*, then for all k in R U I, c(k, 1) = a(k, 1) = 0. 
Ахтом D7. If S = (R, I) is in S* and n is a positive integer and SQn, then: 
If kis in R, then e(k, n + 1) = с, n) + 0(8){1 — o(k, n)] and ak, n + 1) = a(k, n). 
If k is in I, c(k, п + 1) = c(h, п) and a(k, n + 1) = alk, п) + a(S)[1 — alk, n)]. 
Otherwise, c(k, n + 1) = c(k, n) and a(k, n + 1) = a(k, n). 
Axiom D8. If S = (R,I) is in S* and n is a positive integer, then 


| En w(k) — 2; alk, n) -w(k) + p> c(k, т) -w(k) 
PS, 2) = 5° 5^ wk) — E alk, n) wi 


kE(RUI) ker 


Axiom D1 eliminates the trivial case in which either there are no cues 
or there is no problem and avoids mathematical difficulties by keeping K 
and S* denumerable at most. Axiom D2 states that w is a discrete probability 
function. Axiom D3 states that the relevant and irrelevant cues in any 
problem are cues in the class K. Axiom D4 states that no cue can be both 
relevant and irrelevant in the same problem. Axiom D5 states that only 
one problem may occur on a given trial. Axiom D6 states that the system 
deals with a theoretically “naive” subject who, at the beginning of training 
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involved. 
(trial 1), had neither conditioned nor adapted to any of the ны у 
; itioni Умер а ; 
i 7 ss of conditioning and adaptation, w 
Axiom D7 states the laws o 1 d deca pln 
n i lier paper on this subject (9). Axio 
cussed above and in the ear r = солай, 
à iv. ^ ty of a correct resp 
"Jaw of performance," giving p, the probabili ашон 
cm und of the number of conditioned and adapted cues. uis ее 
inicie that p is the proportion of non-adapted (i.e., still-funetional) cues 


i : "ues 
which are conditioned plus one-half the proportion of non-adapted eue 
which are unconditioned. 


Theorems 


The theorems to be proved could not be 
system in (9). The equations derived in Theorem: 
directly with experiments. 

The first empirical problem of the theor 


proved rigorously with the 
s 2, 3, and 4 were compared 


ry is the evaluation of the learning 
ng data. This is accomplished by 
on relating p(S, п) to 6(8). It is 
notonie with respect to both 6(S) 
cted to determine 0 knowing the 
onds to P(S, n). Since such curve- 


total number of errors expected, 
learning experiment is continued 
until the subject h; i i Чоп, the total errors made сап be 
its corollaries make it possible to 

evaluate 0(5) in practice, 

The second empirical problem has to 
Experimentally, we observe 
between, say, black and w 
the same apparatus we observe a secon 
criminate, for example, high and low pitches, and we determine 0(8,-1). 
The two sets of cues, brightness and pi 
not probably affect о 
problem is run in w 
example, the subjec К and high pitch from white 
and low pitch, Theorem 2 makes j i i 


do with the combination of cues. 
representative sub 


hite, and from th: 


ne another perce: 
hich both brigh 
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problem are irrelevant in the more difficult problem. The more difficult 
problem is constructed from the easier one by shifting some cues from the 
set of relevant cues into the set of irrelevant cues. To predict transfer per- 
formance we first determine the @ values of the two problems by running 
them separately with naive subjects. From these values and knowledge of 
the number of trials of training on the easy problem, we ean predict P(Snaray n) 
for all trials on the hard problem (7, 9). The required formula is derived in 
Theorem 3, and the total number of errors made in transfer is derived in 
Corollary 3.1. 

The “сопуегзе” of the experiment discussed in Theorem 3 is an experi- 
ment in which the subject is first trained on the difficult problem and is 
then transferred to an easier one of the same sort (9). The formula for pre- 
dicting performance, based on knowledge of 6(easy), @(hard), and the number 
of pretraining trials on the hard problem, is given in Theorem 4. 

Theorems 2, 3, and 4 make exact quantitative predictions of expected 
performance curves. Testing the predictions against empirical results does 
not involve curve-fitting and the use of arbitrary empirical constants. The 
predicted curve can in principle be drawn before any subjects are run on 
the test problem, and the theory is not confirmed unless the test performance 
corresponds to a particular learning curve predicted. 

Since the proofs of the theorems are elementary in principle and some- 
what tedious, only the method of proof will be given. The careful reader 
can verify for himself that entirely formal proofs are possible. 

THEOREM 1. If S is in S* and SQj for all positive integers j < п, then 
[using 0 as an abbreviation for 0(S)] 


0(8,т) = 1 = Mà — 0)" (0 + (% — oy. 

Pnoor. We note that if k is in R, c(k, n) = 1 — (1 — 6)'^' and if k is 
in Г, a(k, n) = 1 — (1 — 0)" *. The theorem is obtained by elementary 
algebra: the above values are substituted into Axiom D8, all terms are divided 
by P euup w(k), and the definition of 8 is employed to simplify. 


COROLLARY 1.1. Under the conditions of Theorem 1, р(5, п) is а mono- 
tonic non-decreasing function of n and a monotonic increasing function of 0. 
Proor. This follows immediately from the theorem. 


CoROLLARY 1.2. Under the above conditions, 


У) 1 - р($,п)] = $ + log Ө]/[(1 — 6) log (1 — ®)]. 
n=l 
Pnoor. We first estimate p(S, n) by the continuous function р/'(8, t) = 
1 — 41 — 0.7/6 + а — 91], and integrate 1 — p’(S, t) by using the 
t 
substitution, y = (1 — 4)’. 
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9 - I) are in 
THEOREM 2. If S = (В, L), 5. = (В. › І.) and S, Е (Rs ‚4з . 
S* and if Ne w(k) = Den w(k) — rer w(k), and if Diver, wlk) + 


Sex, w(k) = Saat w(k), then 
1— 6)8) = [1 — &(S)]1 — 905)]/(1 — 0(8,) e(S.)]. 
Pnoor. The proof follows immediately from the definition of б. 


THEOREM 3. (i) If S, = (В, , I) and 5, = (2, , Is) are in S* and if 
Fi; is a subset of R, and I, is a subset of I, , and if 


ps uk) = У w(k), and 
kE(R1UI,) 


kE(R3UI3) 


Gi) if for alli < п, SQi, and for all n + j, 8.Q(n + J), then 


SEL 07716 — 9.0 — ay" р ey] 
QU vnb = OR уи = крсу 

Proor. Let k be a cue in R, . Since R, за subset of R, , I: is also in B. 
At the beginning of trial n + 1, for all k in R, velk, n +1) = 1 (1 = 8. 
After j — 1 further trials on the second 
(1 — 6)'"". This is the conditioning of a] 
At the beginning of trial n + 1, for all cues in T, , a(k,n + 1) =1 - (1 0)". 
For cues which are in 7, but are not in 7, , alk, n + LESS — Gis 6)? =0. 
(The fact that these latter cues have been соп 
since they are not relevant.) The theorem is obtained by using Axiom D7 to 
determine c(k, n + j) and a(k,n +j), substituting these values into Axiom D8, 
dividing by DOS w(k), and collecting like terms, 


COROLLARY 3.1. Under the conditions of 


ditioned is of no importance, 


Theorem 3, 

а Ра м В - А B-A 

2, П — »(S; n+] = fete log (1 — gJ Hog @ — log (6, + B)], 
218, — 0, Gig (1 ey 6)": — 6, 


where А = 
(1 TN 01)", 
Proor. Note that by Theorem 3, 
1 = р(8,п + j) = 


= 9,)*], and В = 8; = & + 


THEOREM 4, Given the sa 


г G me conditions as under (i) in Theorem 3, but if 
for alli < т, 5,05 and af for all п EJ, SiQ(n + 7), then ( 
1 


Е guter На = бу —0— "(1 ej] 
8, n+ = 2 2 „(1 9.) a 0) a = 

Д 3) DE gr = ay 

o that of "Theorem 3. 


Pnoor. The proof is similar t 


N 


F. RESTLE 20 


Discussion 


Certain characteristics of the axiom system offered in this paper may 
require explanation. The extremely abstract nature of the axioms is designed 
to separate carefully the formal system from its psychological interpretation. 
This separation makes it possible to be sure that all needed assumptions 
have been explicitly stated. Axioms 01—06 are formal in nature and do not 
represent crucial psychological assumptions. However, if the required theo- 
rems are to be proved rigorously, such axioms are necessary (8). 

The purpose of the paper is to make clear the formal assumptions, not 
their empirical consequences. However, it may be noted that if the four 
primitive notions K, S*, Q, and w are defined operationally, the other three, 
с, a, and p, can be defined explicitly by using Axioms D7 and D8 as definitions. 
If the notion of a cue can be made clear, there is not likely to be any difficulty 
with the notion of a class of discrimination problems, or the occurrence of 
a problem. Operational definition of w, the weight or probability of a cue, 
seems at first glance difficult, but since the theory makes it possible to evaluate 
9 for any problem, one can in principle measure the ratio of weights of any 
two sets of cues. Thus, the measurement of w does not offer a theoretical 
difficulty, however complex the experimental manipulations may become. 

The empirical definition of a cue is roughly the following: k is a cue if 
and only if, when the subject is given appropriate training, then he learns 
to make differential responses based solely on k. Here appropriate training 
is the most efficient training program possible. Often we do not know what 
training program this is or how long training must be continued to get learning, 
with the result that empirical use of this definition is hindered. It does, 
however, give a fairly clear intuitive idea of the meaning of the term cue. 

To define S*, the set of problems, it is essential only to know what a 
cue is and to distinguish relevant from irrelevant cues. A cue is relevant in a 
particular problem if it can be used in that problem as the basis for consis- 
tently correct response. A cue is irrelevant if the problem is so designed 
that the cue cannot be used as the basis for consistently correct response. 

The relation of occurrence, Q, of a problem, does not take into account 
whether the subject makes a correct or incorrect response. Given the concept 
of a problem, the notion of occurrence of a problem is clear since it corresponds 
to the usual experimental notion of a trial (especially in non-correction type 
training where one run through the apparatus or situation is considered a 
trial). 

Another characteristic of this theory is the very strong assumption 
identifying 0 with the relative weight of relevant cues. Without this assump- 
tion it would have been extremely difficult to evaluate the needed learning 
parameters, and experimental tests would have been complicated immeasur- 
ably. While one may be skeptical that such a convenient assumption would 


208 PSYCHOMETRIKA 


be satisfied, it permits a coherent and powerful theory to be SENDER, 
Having made a very useful simplifying assumption, the theorist can alw ays 
retreat when the data demand it. 

Finally, it may be noted that this theory 
account for that important class of experiment, 1 
are reversed, i.e., where the formerly correct cue becomes incorrect, and the 


formerly incorrect cue becomes correct. Generalization to this field of data 
is needed to broaden the empirical base of the theory. 


in its present form does not 
S in which the relevant cues 
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THE OBJECTIVE DEFINITION OF SIMPLE STRUCTURE 
IN LINEAR FACTOR ANALYSIS* 


LEDYARD В TUCKER 
PRINCETON UNIVERSITY 
AND 
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Requirements for an objective definition of simple structure are 
investigated and a number of proposed objective criteria are evaluated. 
A nction is drawn between exploratory factorial studies and confirmatory 
factorial studies, with the conclusion drawn that objective definition of simple 
structure depends on study design as well as on objective criteria. A proposed 
definition of simple structure is described in terms of linear constellations. 
This definition lacks only a statistical test to compare with possible chance 
results. A computational procedure is also described for searching for linear 
constellations. This procedure is very laborious and might best be accom- 
plished on high-speed automatic computers. There is no guarantee that the 
procedure will find all linear constellations, but it probably would yield 
satisfactory results for well-designed studies. 


The principle of simple structure, proposed by Thurstone as a solution 
to the problem of indeterminacy of position of axes in the factorial structure, 
has received wide support and use in factor analysis. There have been, 
however, a variety of criticisms including (1) a skepticism regarding whether 
this principle of simplicity did, in reality, adequately parallel nature, and 
(2) a feeling of disturbance at the subjectivity involved both in theory and 
in application. The first problem, that of the validity of the simple structure 
concept, may be settled only by experimental studies. It is the purpose of 
this paper to assist in solving the second problem, that of subjectivity, by 
attempting to develop a more objective and operational view of the simple 
structure concept. 

Two major concepts of the nature of factors are used to justify the 
principle of simple structure. Thurstone’s views might best be summarized by 
the following quotations: "In the interpretation of mind we assume that 
mental phenomena can be identified in terms of distinguishable functions, 
which do not all participate equally in everything that mind does ··· No 

*This research was jointly supported by Princeton University, the Office of Naval 
Research under contract. N6onr-270-20, and the National Science Foundation under grant 
NSF G-642. The author is especially indebted to Harold Gulliksen for his many exceedingly 
helpful comments and suggestions made during the course of this development. A debt of 
gratitude is also owed to Mrs. Gertrude Diederich, who performed many intricate calcu- 
lations in the experiments on computing procedures. The author further wishes to express 


his appreciation to Frederic M. Lord and David R. Saunders, who read the manuscript 
and made a number of very useful suggestions. 
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assumption is made about the nature of these functions, whether they are 
native or acquired or whether they have a cortical locus.” (14, p. 57.) “Just 
as we take for granted that the individual differences in visual acuity are not 
involved in pitch discrimination, so we assume that in intellectual tasks 
some mental or cortical functions are not involved in every task. This is 
the principle of ‘simple structure’ or ‘simple configuration’ in the under- 
lying order for any given set of attributes." (14, p. 58.) Cattell (3) expresses 
a similar view. In contrast to the foregoing, Holzinger and Harman (5) 
express a variant view that factor analysis, as a branch of statistieal analysis, 
conveys information in the original data with an aim of parsimony which 
should not be construed as a search for fundamental categories. Similarly, 
Vernon (19) takes a position that “... it should be clear that a factor is a 
construct which accounts for the objectively determined correlations between 
tests, in contrast toa faculty which is a Бу 
Others have taken views on either of thes 
to some middle ground. Since each of 
yielding support for the desirability 
definitions to follow could be derived from either view and will not distinguish 
between them. Some such view is 


toward acceptance of the 


Studies and Simple Structure 


might best be conceived as a 
solated, separate studies. Each 
d from previous studies and add 


arly studies in.some domain, or 


applying primarily 


to the more perfected 
A major premi 


Se of the present argument is that the objective definition 
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of a simple structure is dependent on both an adequate study design and on 
objective analytic criteria. Not all factorial studies may possess a simple 
structure, only those studies involving an appropriate battery of measures 
made on an appropriate sample of individuals. Some requirements set forth 
by the analytic criteria may be met only in the study design. It is desirable, 
however, that there be a maximum of freedom in the design of factorial 
studies so as to fit as many situations as possible. For example, an experi- 
menter should be in a position to test objectively hypotheses concerning the 
relations of complex measures to factorially simpler ones. Thus, it is desirable 
that the analytie criteria permit complex variables and not limit the study 
design to factorially pure measures. The factorial simple structure needs 
to be unambiguously present, however, in the data. This is a function of the 
study design. 


Requirements for Objective Definition of Simple Structure 


Following is a proposed list of requirements for satisfactory objective 
criteria for simple structure. These requirements should be interpreted as 
applying to individual studies since invariance of factorial results over 
various changes in the population of individuals sampled and in the battery 
of measures is a matter for experimental verification. It will be noted, 
however, that small variations of factor loadings and projections from ideal 
values are permitted. These small variations from ideal might result either 
from random sampling error peculiar to the sample of individuals measured or 
from errors of approximation in the basic factorial model. 

A second point to be noted is that a choice is made as to kind of pro- 
jection employed relating test vectors to factors. In the case of correlated 
factors, orthogonal projections of test vectors on normals to hyperplanes 
are used. These orthogonal projections for a particular factor depend upon 
location of only the hyperplane for that factor and upon the test vectors. 
They are independent of the locations of all other hyperplanes. A further 
reason for this choice as to type of projection is that the square of this type 
of projection can be interpreted to represent the independent contribution 
of the factor to the variance of the variable. 


a. Basic requirements 


1. Emphasis is placed on a maximum concentration of vectors along hyperplanes, 
that is, on a maximum number of zero projections on normals to the hyper- 
planes, allowance being made for small variations in observed projections. 

2. The vectors interpreted as being in each hyperplane span a space of (r — 1) 
dimensions, allowance being made for small variations in observed projections, 
where r is the number of dimensions in the common-factor space. 

3. Exactly as many simple structure factors are obtained as there are dimensions 
in the common-factor space. 
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b. Types of freedom explicitly permitted 
4. Oblique factors are permitted. 
5. A minority of highly complex measures whose vectors have projections on 
several, up to all, factors is permitted in the battery being analyzed. 
c. Operational requirements 
6. The choice as to which 
objective grounds. 
7. An objectively determined best fit to the data is involved. 
8. The best fit is unbiased in the limitin, 


projections are to be interpreted as zero is made on 


9. Statistical tests exist which indicate th 


10. An automatic computational procedure is 


available for use with any particular 
study. 


two-dimensional 
be interpreted as 
here in this space a 


heme enever it seems advisable, a 
permitted. This could be a function 66 that only orthogonal factors were 


tudy being analyzed or of the 
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mentioned in this article. It is desirable for experimenters to be able to check 
in an objective fashion on hypotheses related to complex variables. Allowance 
for measures that have loadings on all factors is at variance with Thurstone’s 
requirement (14, p. 335) that each row of the factorial matrix have at least 
one zero loading. In the opinion of the author this becomes an unnecessary 
restriction in case the basic requirements previously listed are met. 

The last five, or operational, requirements relate to desirable aspects 
of objective criteria for simple structure. Requirement six could be met by 
the establishment of a range of projections, centering on zero, to be inter- 
preted as negligible or zero projections. The limits for this range could be 
considered as generalized constants to be defined by the analyst on a priori 
grounds. A best fit of the data in some statistical sense as per requirement 
seven is certainly desirable. That this best fit should be unbiased, as per 
requirement eight, is also desirable. It is this requirement, however, that is 
likely to differentiate between an ideal objective criterion and various approxi- 
mate ones. Requirements nine and ten are quite crucial, but at the same 
time may be the most difficult to satisfy. The statistical test of requirement 
nine is necessary for scientific acceptability, but it may be the last point to 
be solved for objective criteria for simple structure. The automatic computing 
procedures should be as economical as possible. It may be, however, that the 
computations for an ideal objective criterion will be so complex and extensive 
that such a criterion will be applied only to a few critical studies. Approximate 
criteria that involve simple computations might be adequate in many cases 
and would be highly desirable. Developments in high-speed computers, 
however, may influence the relative economies of the criteria. 


Review of Previously Proposed Objective Criteria for Simple Structure 


Turning next to an examination of proposed analytical definitions and 
procedures for a simple structure solution, Thurstone’s equation for a simple 
structure will be considered first (11; 14, pp. 354-356). Thurstone makes 
the interesting proposal that his equation 28 is the equation for a simple 
structure. 


П | ж zw = 0, (1) 


where р indicates simple structure factors, г is the number of factors, т 
indicates reference factors, a,, is a coordinate of a point on reference factor m, 
and А„„ is the direction cosine on reference factor m of simple structure 
factor p. This equation states, in essence, that the product of the projections 
for each vector separately on the normals to the hyperplanes should be zero. 
This could be accomplished by the existence of at least one zero projection 
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for each vector. A least squares function for determining a best fit of the 
equation to data is suggested in Thurstone’s equation 32. 


ELE вә} -» @) 
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toward having each hyperplane determined in (r — 1) dimensions. The 
third requirement is also satisfied in that a complete set of factors are con- 
sidered. In the area of types of freedom permitted, either oblique or ortho- 
gonal factors may be used. There is a relation, however, between the use of 
complex variables and obtaining an unbiased fit to the data (requirements 
five and eight). Following a presentation of illustrative applications of his 
criterion, Carroll points out the biasing effects. of complex tests and concludes 
that “These considerations lead to the conclusion that the present criterion 
will probably work best for well-designed factor studies where there are a 
large number of factorially pure tests and a relatively small number of 
factorially complex tests.” (2, p. 33.) Requirements seven and ten are satisfied 
in that Carroll presents an objectively determined best fit and a procedure for 
accomplishing it. The procedure is laborious, but might be programmed for 
electronic computers. Requirements six and nine are not satisfied, but might 
be so by further developments and definitions. We conclude that Carroll’s 
proposal is highly promising as an approximate method. It does satisfy the 
basic requirements, and tends to do so also for the types of freedom permitted, 
but it has some undesirable properties in the operational requirements area 
such that we agree with Carroll that his method is to be considered as yielding 
an approximation to simple structure. 

Saunders (9, 10) has proposed a criterion for an approximation to simple 
structure involving the sum of fourth powers of factor loadings on orthogonal 
axes. Since it can be shown that Saunders’ criterion is mathematically iden- 
tical with Carroll’s criterion discussed above when the orthogonal case is 
Considered, we need not discuss Saunders’ work extensively. In addition to an 
interestingly different and simpler computational procedure from that of. 
Carroll, Saunders presents some comparisons of results from actual studies 
With results that were obtained from chance configurations of vectors. The 
results are quite promising. 

Several other interesting recent publications involving closely related 
work to Carroll’s development include articles by Ferguson (4), Neuhaus and 
Wrigley (7), and Pinzka and Saunders (8). Ferguson, starting from informa- 
tion theory, suggests using the sum of squares of products of factor loadings 
as a measure of parsimony, or lack of parsimony. Neuhaus and Wrigley in 
their quartimax method maximize the sum of the fourth powers of the factor 
loadings. A point of interest is their use of the ПНас (a high speed electronic 
computer). Pinzka and Saunders extended Saunders’ solution to the oblique 
case. The discussion of the preceding two paragraphs applies directly to all 
three of these papers. 

Thurstone in 1936 (12) proposed an analytic solution for simple structure 
involving a least squares solution of projections for a sub-group of variables 
for each hyperplane. The sub-group of variables was selected in terms of 
limiting sizes of projections on successive trials of an iterative procedure. 
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All of our requirements are met explicitly except two, three, and ма 
method used for selection of variables for the sub-groups allows the possi i ity 
that the essential dimensionality of the space spanned by the ашты 
would be less than (r — 1). By essential dimensionality we mean the = E 
of dimensions in which some vectors for the sub-group have projections t ла 

would not be interpreted as zero (that is, less than the stated limits on _ 
of projections used in selecting the variables). In our proposal, to be discusse 

later, an objective procedure is indicated which will cireumvent our obyeation 
to this method of Thurstone. For Thurstone’s method as he proposed it, we 
feel that failure to guarantee that the sub-group spanned an (r — 1)-dimen- 
sional space was a serious drawback which would make the method unaccept- 
able. Requirement three could be met for each Study by a succession of 
solutions, each involving location of a single hyperplane, until as many 
distinet hyperplanes were found as there were dimensions in the common 


factor space of the study. There is no guarantee, however, that all such 
hyperplanes could be found. 


A variant of Thurstone's preceding 
(6), in which he maximized the ratio of 
projections to the sum of Squares of all 
This is mathematically equivalent to min 
squares of the non-significant projections 


jections. Again the difference between significant and non-significant projec- 
tions was made in practice on size of projection in successive trials. Comments 
on this method are identical with those on the preceding method. 
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designed to insure that the sub-groups 
ions. In that these procedures involve 
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increase the chance of involving variables spanning (r — 1) dimensions 
in the determination of the hyperplane. In that the range of projections 
that receive finite weights is broad there is a chance that the solution could 
be biased in the sense used in our requirement eight. Vectors with significant 
projections on the normal could influence the location of the normal and 
thus produce a non-zero mean of non-significant projections even when 
the variance of the non-significant projections was low. We conclude that 
this latest objective method should be classified as an approximate procedure. 
It may be a very useful procedure, however, since the computations are 
quite simple and the results presented by Thurstone indicate good approxi- 
mations to the desired results. 


Definition of Simple Structure by Linear Constellations and Vector Masses 


In the objective definition of simple structure proposed here a concept 
of linear constellations is employed. Consider the left half of Figure 1. This 


p 


Figure 1 


is a two-dimensional view of a factorial geometric model. Other dimensions 
are orthogonal to the plane of the figure. Each dot is the projection of the 
terminus of a vector representing a variable included in the battery being 
analyzed. It is postulated for Figure 1 that the battery of variables is such 
that the vectors might appear in a band such as is shown. If a direction is 
chosen orthogonal to this band, the vectors represented by the dots concen- 
trated in this band will have small projections. In terms of a parametric 
explanation of the variances of the variables there will be a corresponding 
low dependence of these variables on a parameter corresponding to the 
direction orthogonal to the band. Such concentrations of vectors into linear 
Spaces which include the origin may be termed linear constellations. 

At the right of Figure 1, a line through the band of points and two 
bounding lines have been drawn to indicate the space of the linear constella- 
tion and the limits for projections outside this space. In general, linear 
constellations may be of any dimensionality less than that of the common- 
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factor space for the entire battery of variables. When the ганлар хава 
tains only one dimension it would be called a cluster. This one din cape 
would represent a single parameter and could be interpreted. In б 
linear constellation has as many dimensions less one as the 2 кет 
space, the constellation may be designated by the ЕШ шеп ort ae ie 
to the constellation. This normal can be used to indicate a paramete E 
involved in the constellation. The projections of the vectors on this agin 
will indicate the extent of dependence of the observed variables on th 
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1. Appended to and equally on both sides of any and every hyperplane in the com- 
mon-factor space is a marginal space of some defined and limited width. 


2. Any vector located entirely within a hyperplane and its Marginal space shall be 
considered as contained in the hyperplane. 
3. The number of vectors с 


ontained in а hyperplane shall be termed the vector 
mass of the hyperplane. 
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the linear constellation, 


Definition of a simple structure adds the follow 


6. A simple Structure is constituted by 


р the hyperplanes for a set of r linear con- 
stellations of dimensionality (r — 1). 


ing step: 


——S 


| 
| 


L. R TUCKER 219 


of the defined radial width in the other two dimensions. In a three-dimensional 
factorial space such a group of vectors would form a cluster around a single 
direction. This group of vectors is contained in any hyperplane whose normal 
lies in the two-dimensional plane orthogonal to the given (r — 2)-dimensional 
space containing the group of vectors. Any hyperplane that contains just 
this group of vectors, therefore, may be rotated without loss of this group of 
vectors and may be made to contain one or more vectors not contained in 
the given (r — 2)-dimensional space. This step depends on the existence of 
vectors not contained in the (r — 2)-dimensional space, but such vectors 
must exist for the common-factor space to be of 7 dimensions. Thus, the 
vector mass of the hyperplane can be increased before any decrease occurs, 
and the original position of the hyperplane did not possess a maximum vector 
mass. This argument can be extended to vector groups contained in spaces 
of (r — 3) or fewer dimensions. In consequence, а maximum vector mass 
occurs only when the vectors contained in the hyperplane are not contained 
also in a space of (r — 2) or fewer dimensions; that is, the vectors contained 
in such a hyperplane must span a space of (r — 1) dimensions. 

The simple structure is defined in step six as being constituted by r 
hyperplanes, which is the dimensionality of the common-factor space (require- 
ment three). No limitations are placed as to oblique or orthogonal factors 
Or as to complexity of a minority of the tests (requirements four and five). 
A defined limit for projections of vectors to be contained in the hyperplane 
is indicated in our definitions one and two (requirement six). The linear 
constellations are objectively defined by maximum vector mass (requirement 
seven), This definition is unbiased since the marginal space of definition one 
is appended equally to both sides of the hyperplane (requirement eight). 

It is hoped that one could derive a statistical test such as is indicated in 
requirement nine. Such a development would make a definite contribution 
to the field of factor analysis. At present, however, this requirement for a 
satisfactory criterion of simple structure has not been satisfied. 


Computing Procedure for Linear Constellations 


An automatic method for searching for linear constellations, as per 
requirement ten, has been developed and tried out. The labor of computa- 
tions is quite great, but within bounds for automatic computing machinery. 
One trial has involved a run on an IBM Card Programmed Caleulator. In 
addition a careful check has been made in detail on the feasibility of per- 
forming the computations on the IBM Type 701 Electronic Computer. This 
machine could perform the required computations on an automatic basis 
within feasible time, such as 10 minutes for 50 variables in 10 dimensions 
for each linear constellation. к 

It is of interest that the method finally adopted as feasible is a combina- 
tion of two methods neither of which is feasible. The first of these methods 
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might be termed a direction survey method because it involved setting ma 
a network of directions as trial normals to hyperplanes, computing the pro 
jections in each of these directions, and then determining the vector qe 
by counts of projections less in absolute value than some defined ven js 
Directions with maximum vector masses would be selected as normals o 
the spaces of linear constellations. Except in limited cases when the dimen- 
sionality of the space to be surveyed is small, the number of directions in 
even very rough networks becomes very great and this method is not feasible. 
various combinations of vectors as trial 
sub-groups. For each sub-group a direction could be determined such that 
was а minimum. The largest sub-groups 
dition that all members of each sub-group 
alue than some limit on the direction with 
minimum sum of squares of projections for the sub-group. Because of the 
large number of combinations of variables to be considered for any study 
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Step 1: List a matrix F, for a selected sub-group of tests (see Table 2). 
For the first cycle the sub-group might be taken as those tests that have 
low correlations with some particular test. In experimental applications 
of this method, initial sub-groups were usually taken to contain approxi- 
mately half of the tests in the battery. It was found that each of the linear 
constellations resulted from several different initial sub-groups. Enough 
different initial sub-groups were used for the study employed in the example 
to be able to find three distinct linear constellations. Two points of general 
concern are the recognition of duplicating results and being able to find all 
existing linear constellations. Any duplication can be readily detected by 
comparisons of the solutions and may be eliminated by discarding results 
from one or more initial sub-groups. The problem of selection of sub-groups 
so as to be able to find all linear constellations is much more difficult. After a 
number of constellations are found, a vector might be set orthogonal to them 
and tests selected that have low projections on this vector. Another possi- 
bility is to first employ a method such as Carroll’s (2), or Saunders’ (9, 10) 
and to establish initial sub-groups of tests with low projections on each of the 
factors so determined. 

The initial sub-group in the example contains tests 3, 4, and 1. For the 
second and subsequent cycles the sub-groups are given by the preceding cycle. 

Step 2: Compute the matrix P, (see Table 2). 


P, = РЕ, . (3) 


Step 3: Compute the two smallest characteristic vectors of P, (see 
Table 2). These are the characteristic vectors corresponding to the two 
smallest characteristic roots of P, . The smallest vector is C, and the next 
to smallest vector is C; . Each of these vectors is to be a unit vector (have 
sum of squares of entries equal to unity). The matrix containing these two 
vectors is labeled A in Table 2. 


Step 4: Compute the matrix projections, V, of all the tests on the two 
smallest characteristic vectors (see Table 2). 


y = FPA. (4) 


Step 5: Survey the space of the two smallest characteristic vectors for 
the radial band of specified width which includes the largest number of test 
vectors. The concept involved is illustrated in Figure 2. A plot between 
projections of the tests on C; and С» is shown on the left. The dots for our 
trial sub-group of tests 3, 4, and 1 are located near the origin. Centered on 
С, and indicated by short lines outside the circle are eleven directions sepa- 
rated by 9°. The line with an arrow is pointing in the direction of —36°. 
Orthogonal to this trial normal is a line for the tentative lmear subspace 
and two limit lines. The trial normal was also placed in each of the other 
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Figure 2 


ten selected directions. The short lines inside the circle indicate oe 
sponding locations of the linear subspace. For each of the set of directions E d 
this survey, a count was made of the number of points between the corre 


in the space between the limits. The ten tests for the dots lying between the 
limit lines were selected for the next sub-group for the next cycle. 


In practice, the plots in Figure 2 would not be made since the operations 
can be performed b 


У computing steps illustrated in Table 3 and outlined 
below: 


; а coarse survey set wit! 
Tvey set was used in Table 3. 
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F-TEST BIAS FOR EXPERIMENTAL DESIGNS IN 
EDUCATIONAL RESEARCH 
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UNIVERSITY OF BIRMINGHAM 


Reference is made to Neyman's study of F-test bias for the randomized 
blocks and Latin square designs employed in agriculture, and some account 
is given of later statistical .developments which sprang from his work—in 
particular, the classification of model-types and the technique of variance 


component analysis. It is claimed that there is a need to carry out an exami- 
nation of F-test bias for experimental designs in education and psychology 
which will utilize the method and, where appropriate, the known results 
of this new branch of variance analysis. In the present paper, such an investi- 
gation is carried out for designs which may be regarded as derivatives of the 
agricultural randomized blocks design. In а paper to follow, a similar investi- 
gation will be carried out for experimental designs of the Latin square type. 


I. Introduction 


F-test bias may be said to exist for a given experimental situation if, 
when the null hypothesis is valid, frequent replication of the experiment 
provides a distribution of F-values which does not conform in some way 
(within the limits of sampling error) to the corresponding theoretical F-distri- 
bution. When bias exists, it is important for the investigator to know whether 
the F-test (the null hypothesis being valid) gives a larger or smaller proportion 
of significant F-ratios than is warranted by the theoretical distribution. 

The possibility of F-test bias for certain experimental designs first 
became a topic of major statistical interest with Neyman’s paper (14) in 
1935. Neyman confined his inquiry to the randomized blocks and Latin 
square designs, which Fisher had developed; these designs had become the 
mainstay of agricultural experimentation. In both cases, he pointed out 
that “the conditions under which the application of the z-distribution is 
legitimate are not strietly satisfied" and went on to show that “in the case 
of the randomized blocks the position is somewhat more favorable to the 
z-test, while in the case of the Latin square this test seems to be biased, 
showing the tendency to discover differentiation when it does not exist." 

Neyman's conclusions met at first with considerable opposition, but as 
Kendall (10, p. 214) points out, the controversy arose mainly from a failure 
to realize that Neyman was dealing with a different hypothesis from that 
usually tested. Thus, Fisher was concerned with the hypothesis that for 
each plot in the experimental field the treatments had the same effect. Ney- 
man, on the other hand, stressed the possibility of interactions between 
plots and treatments and considered the more general hypothesis that the 
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mean effects of treatments over all plots involved in the experiment were 
the same. In 1937, Welch (20) clearly distinguished between the two hy- 
potheses and went on to show that the 2-distribution furnished an approxi- 
mate test of the Fisher hypothesis both for the randomized blocks and Latin 
square designs. His findings did not, of course, invalidate in any way Ney- 
man's analysis. 

With regard to the validity of Neyman's analysis, it 
out that while Neyman insists that the correction for fertil 
vary from treatment to treatment, he regards the fertili 
blocks (in the case of randomized blocks) and for rows an 
case of the Latin square) as being the same for 
does not appear justifiable and, if not made, 
fied considerably. 


might be pointed 
ity of a plot may 
ty corrections for 
d columns (in the 
all treatments, This assumption 
Neyman's results might be modi- 


ariance theory 
(5)]. Further, 
the types of 
used on the 


pe 1 à analysis in su ort, that 
this assumption will lead to too many Monica F's, he 
Lu 
9m appropriate and that 
Sound a priori reasons 
ould be true to say 
of the other investi- 
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psychology. Secondly, many of the articles are of too recent origin for their 
results to appear to any great extent in the textbooks and research publica- 
tions belonging to the latter field. 

There is obviously a need to carry out an examination of F-test bias 
for experimental designs employed in education and psychology. Some 
such research has already been reported but it requires to be supplemented. 
The method and, where appropriate, the known results of variance component 
analysis should be utilized. In the present paper such an investigation is 
carried out for designs which may be regarded as derivatives of the agri- 
cultural randomized blocks design. In a second paper, the same will be done 
for those designs of which the agricultural Latin square is the prototype. 


П. Method 


For a valid (ie., unbiased) F-test, the two variances involved in the 
F-ratio must be independent unbiased estimates— based on the stated numbers 
of degrees of freedom—of the same normal population variance. Test bias 
arises when the variances of the F-test fail to satisfy this set of conditions 
in one or more respects. It follows that, in order to detect bias, it is sufficient 
to examine the data—by simple inspection or by statistical analysis—for any 
failure to comply with the conditions of normality, homogeneity of variance, 
independence of estimates, etc. 

It is not, however, sufficient to know that bias exists. An investigator 
also wants to know (a) the direction of the bias and (b) its magnitude; or, 
at least, he wants some indication of the answer to both these questions. 
For convenience, we will define an F-test to be positively or negatively 
biased, if, in the case where the null hypothesis being tested is correct, the 
test produces a larger or smaller proportion, respectively, of significant: F-ratios 
than is warranted by the F-distribution. 

In this paper considerable use will be made of Neyman’s procedure 
(i.e., variance component analysis) as a method of detecting bias and of 
indicating its direction and magnitude. (The other and perhaps more common 
use of this form of analysis to obtain estimates of variance components will 
be involved only incidentally.) The method consists simply of taking the 
mathematical model which applies to the experimental situation and deriving 
analytically the expected values of the variances involved in the F-test. 
Then, in the case where the null hypothesis holds, the expected value of the 
“treatments” variance will be equal in magnitude to that of the “error” 
variance if no bias is present. When the two expected values are unequal, 
positive or negative bias is suggested according as the first variance is greater 
or less than the second. Also some measure of the magnitude of the bias 1s 
provided by the amount the ratio of the two expected values differs from 


unity. 
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For ease of exposition, we shall refer to this ratio as the B-ratio. Thus 


positive or negative bias is suggested by B-ratios greater or less than unity, 
respectively. 


( has several limitations: | 
vp Sed of unity is a necessary but not a sufficient condition 1 
zero bias (the empirical F-distribution may have the same mean as E 
theoretical F-distribution but may differ from the latter in respect of standar 
deviation or any other moment). Consequently, it frequently happens that 


bias is present although the B-ratio is unity. Several instances of this appear 
in the present study. 


(її) As might be expected from 


ratio from unity is only à 
- Obviously aecount must 


matical difficulty, are not 


; empirical methods must be adopted. No such empirical 


studies are attempted i 
Studies of this type. 


III. Models 


All the models involved in this 
two-way classificat; 


paper are related to them. 


Eisenhart (8) distinguishes three types of models. In describing these 
We shall follow Crump (5) and adopt a broader interpretation than that 
chosen by Eisenhart. 


Model I (Fixed variate model) 
This may be written 


г 1, “7? 
алав а е 55, ve у, а) 
t=], * 
where X. 
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respectively, Г.. denotes the interaction effect for the rth column and sth 
row, and e, is the random error for the observation. 

The population of A's, B's and Ps are all finite (of zero mean) and 
are exhausted in the given p X q classification, but the population of es is 
continuous with a normal probability distribution of variance о... 

The expected values of the mean squares involved in the analysis of 
variance of the data are as follows: 


d. f. Expected Value of Mean Square 
Columns p-1 c? + qn Zr Af/(p — 1) 
Rows а-1 в + т У Bé/(q — 1) 
Interaction (р—1)(4—1) gion У 2 че — 1) (@ = 1) 
Residual рат — 1) ig 
It will be seen that the null hypotheses (i) A, = 0 (r = 1, =" 4D); 
@B=0¢=1,-,@,@)I,=0C=1--sm8=h ‚Ф 


are tested by examining the significance of the F-ratios of columns, rows, 
and interaction, respectively, with respect to residual. Normally when inter- 
action is significant, the investigator is not interested in making the test for 
columns and rows although there is no theoretical objection to his doing so. 
(Eisenhart actually restricts Model I to the case of zero interaction by making 
his second assumption of additivity). 


Model II (Random variate model) 
This may be written 


r=1,+::,p 
Хы = BH ar + В, + tee + rat s-21,--,9[, (2) 
b= Dy oe" 5 


where the terms may be described as for the corresponding members of 
Model I but, in this case, the p a-values, 9 8-values and pq n-values are 
random samples from normal distributions of zero mean and of variance 
ca, cg, and c, , respectively. 

The expected values for the mean squares in the variance analysis are: 


d. f. Expected Value of Mean Square 
Columns ped с? + nos? + nq os? 
Rows gu o + nos? + пр eg? 
Interaction G= 1) (2.2) с? + nes? 
Residual pg(n — 1) a? 
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The null hypothesis ,ڍ‎ = 0 is tested by testing interaction в ж 

1 hypotheses о.” = 0, c = 0 are tested by testing co qoe 

Ab dere arm against interaction or, where it is known а priori a 

тт dis wm against total residual of (рз — p — q + 1) d. f. (It | 1 
peine s thë table that when o,” = 0, columns and rows can be teste 

сы شت‎ either interaction or residual. 


Mixed Model 
This takes the form 


8 1, ++ и! 
Хи тв А, в, т. е, ds м | (3) 


2 = 1,55. 0 


Il 


where the population of A-values is finite (of size p) but the B- and тше 
are random samples from infinite Populations. The expected values of the 
mean squares now read: 


d. f. Expected Value of Mean Square 
Columns 


Р—1 а + по? + па У, A/(p = 1) 
Rows =1 o + по, + np ag? 
Interaction (p — 1) (q— 1) oe + по? 
Residual Pq(n — 1) T? 


Tests of hypothese 

Useful as the 
which occur in pr 
agricultural randomized blo 
more extensive classification 


$ are made as for Model II. 

above classification is, it fails to c 
actice. Thus, the models of Fish 
cks design bel 
has been pro 


over many of the cases 


E 
| 
= 
© 


(4) 
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The main difference between this and Eisenhart's Mixed Model is the addi- 
tional random error term £, common to all observations in the subclass 
(r, s). As will be seen later, this term differs from the y-term in that the 
£-values are usually regarded as independent (uncorrelated) while the 7-values 
may be correlated and, what is more, show heterogeneity of correlation 
(between columns). 

The next section, dealing with the investigation of bias for these models, 
falls conveniently into three parts: 


1. Equal numbers in subclasses, i.e., n,, = const. = m, say. 

2. Numbers in subclasses unequal but proportional, i.e., n, = Na,b, 
where № is total number of cases sampled and a, , ++- , a, and b, , -++ , ba 
are the:proportions of cases in columns and rows, respectively (S. a =] = 
25. b). 

3. Numbers in subelasses unequal and disproportionate. 

Types of bias common to all three cases are discussed in the first part. 


IV. Investigation and Results 


1. Equal Numbers in Subclasses (n,, = n) 


It will make the discussion more concrete and less theoretical if we 
speak in terms of a methods experiment replicated in a random sample of 
schools. Lindquist (12) gives an excellent account of the experimental design 
and statistical analysis required for this type of experiment. The main F-test 
in the analysis is that of the methods variance against the ?nteraction variance. 
The hypothesis tested is that the methods have the same mean effect over 
the total population of schools. 

The interaction term of the analysis not only contains sampling error 
(measured by the variance within classes) but it may, and usually does, 
contain two other elements: 

(i) real interaction between methods and schools; 

(ii) group errors, i.e., errors which apply to the experimental groups as 
wholes and which are produced by factors other than method and school 
differences, e.g., teacher differences. 

It will be seen that the model for this type of design is a version of the 
special mixed model mentioned at the end of the last section, namely, 


ZEE 
Хы, = p + А, + В, + tre b . Тен big e (5) 
[а= 1, ---,в 


where и is the general mean and the А, В, т, £ and e represent the effects due 
to methods, schools, interaction, group error, and sampling error, respectively. 
As usual У), A, = 0. &. and e,, are random, the parent populations 
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i sumed to be normal, of zero mean and of variance о; and о. , respec 
я A is usually defined so that (u + 8.) is the mean for E = p 
over the p methods (and the total population of = and e-values 3 pts 
might be more instructive if we here take (м + В,) to be the шш n. 
sth school over a population of methods which includes the p met ү NE 
consideration. The parent и of B-values T be assumed to be i 

iance og" (the mean of course is zero): Р 

Ls [nie of d methods within the total population of methods : > 
normally resemble one another more than they do the others, the ы. = 
terms for these methods (assuming there is real interaction between met pin 
and schools) will be more highly correlated with one another than with E 
other z-terms. We shall now assume that for our p methods (and the to n 
population of schools) the ;-terms are equally correlated, with correlation 5 
we shall also assume that they are normally distributed with the same vari 
ance о,” for each method. The population mean will in each case be zero. 

With this definition of our model the expected values of the mean squares 
for the analysis of variance are as in Table 1. For the benefit of the reader 


TABLE 1 
Variance d. f. Expected Value of Mean Square 
Methods p=] 92 + ту (1 — p) Но + па У, Az/(p — 1) 
Schools -1 


а + nfo,(1 — р) + og + рро,?] + пров? 
4—1) 62 + ne, (1 — р) +o ] 
2 


q 
Methods X Schools (р — 1) ( g? 
Pan = 1) e, 


Within Classes 


who is doubtful of th 


e procedure for obtain 
the expected val 


ing such a table, the derivation of 
ue of the mean square for 


methods is reproduced here. 
The sum of Squares between methods is given by 
Z mM? - E mM)? (т = ЖОГ ‚®), (6) 


or more conveniently by 


TEM,- P ie hus 
D iz 


P), 0 
where 
буз well 1 J SDr 
M, — M, A2. È Xu = m Xx. | п 
t= 1, $$ m 
1 
Pide Eie - &J] (8) 


1 
F qn > x (erse — 6). 
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Substituting in (7), squaring out and taking the expected value of the resultant 
expression, we obtain 


2 


] в y Зе 2 2 
? yas - ay epa = A + ef] + 2s) =, 


which reduces to 


qn x A? + пр — 016201 р) + ec] + (р е. (9) 


The required result follows. 

If, instead of the above definition of В, , we define 8, to be such that 
(и + 8.) is the mean of the sth school over the р methods only, then 
>, n. = 0 (as in the case of Model I). It is easy to show that p will now 
have the value — 1/(p — 1). [If we substitute this value for p in Table 1, the 
expected value for the schools variance becomes (s + neg + пров’), which 
does not contain c,^—a result which is obvious from the definition of 8, now 
being assumed.] It might be argued that the correlation p is an artifact, 
since its value depends on the way the 8-values are defined and р сап thus be 
made to have almost any value we please. But the reader should note that 
correlations in the variance component analysis cannot be avoided when 
there is heterogeneity of correlation between methods [case (c) below]. 

We will now consider three possible sources of bias for the methods v. 
interaction F-test. [Bias arising from non-normality in the data will not be 
considered in the present paper. Much work has been done in this field and, 


‚ While most of it has been concerned with the simpler applications of the 


analysis of variance and not with more complex analyses such as may occur 
in education, it is probably true to say that these findings have general 
application.] An application of Neyman’s technique to the modified form 
of the basic model for each of the three cases is useless, since the B-ratio is 
found to be unity. The results are not reproduced here. It is possible, however, 
to make some fairly definite pronouncements on the bias involved in each case. 


Case (a). Heterogeneity of variance within classes (from school to school) 

Asa result of an empirical study, Lindquist and Godard (12, pp. 139-144) 
concluded that this type of heterogeneity “will not seriously affect the validity 
of the test of significance of methods differences based on the ratio of the 
M and M X S variance.” A corollary to this result is that heterogeneity of 
group errors from school to school will not seriously bias the F-test. 


Case (b). Heterogeneity of variance within methods 


This type of heterogeneity may arise in two ways: either (ê) the variance 
within classes may vary from method to method; or (77) the variance due to 
“real” interaction may vary from method to method. 
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It seems unlikely, as Lindquist remarks (12, p. 144), that the o 
would produce sufficiently large differences in variability to disturb a 
F-test seriously. But, where this did happen, the following remarks abou 
bias might be made: 


() No bias results from this type of heterogeneity when only two | 


involved. This can easily be established analytically. - 

HR With more than two methods, the bias is likely to be aspi 
It is known that when a t-test is applied to two random groups of the Y^ 
size, heterogeneity of variance causes the test to be positively biased D 
p. 170). There is no contradiction between this result and that stated in "eA 
previous paragraph. Heterogeneity of variance produces bias in the (-tes 
when applied to random groups but not when applied to matched groups. 
The latter case corresponds to the replicated experiment with two methods. 

It is very likely that the same holds for the F-test When applied to more 
it would also apply to the 
ds are involved). A considera- 
usion. 
for a similar situation in agricultural 
Cox (4, pp. 396-398). A more general 


comparisons and the computation 
of separate error terms for each. It is better, however, if such a solution 


is found to be unnecessary t does a considerable loss in degrees 


(involving as i 
of freedom). 


Case (c). Heterogeneity of correlation between class 


The point to be no 
than others, and, conse 


be more closely related with 0 


means (within methods) 


ted here is that some methods may be more alike 


effects (with schools) will 


The type of bias present can be easily demonstrated with fictitious data 
for a highly theoretical case. (The example which follows probably affords 
a better understanding of the way in which the bias operates than is to be 
gained by any lengthy analysis). 

Consider an e i 


Schools 
1 2 4 5 6 4 
Method A 39 55 45 47 46 40 51 
Method B 38 40 
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Then, with equal numbers in the experimental groups, the methods 
and interaction components of the variance analysis read: 


d. f. Sum of Squares Variance F-ratio 
Methods 1 87.5 87.5 4.375 
Methods X Schools 6 120 20 ر‎ 


The F-ratio is not significant (for 1 and 6 d. f., F = 5.99 at the 5 per cent 
level of significance). 

Now suppose that a third method, C, had been incorporated in the 
experiment and let us take the extreme case of C being identical with B. 
Also, let us imagine, to present the argument in its simplest form, that in 
this experiment there is no sampling error and that the interaction term 
consists only of real interaction. Then the means for the school groups sub- 
jected to method C will be the same as those for the groups undergoing 
method B. An analysis of variance for the three methods will therefore still 
give the same value for the F-ratio; but now with 2 and 12 d. f., significance is 
obtained at the 5 per cent level (F = 3.88). 

We might consider what would happen with further replication of 
method B. Thus, with four replications, significance can be obtained at the 
1 per cent level (F = 4.22 for 4 and 24 d. f.). Obviously, with the given form 
of analysis, the replication process increases the number of degrees of freedom 
without producing any real increase in the precision of the comparison of 
the methods. With a separation of the methods comparisons, such as Cochran 
suggests (see above), the spurious effect can be avoided. 

The fact that methods are never identical and that sampling and other 
errors are always present does, of course, considerably reduce the amount 
of bias of this type which can occur. It is very probable that in most practical 
cases it is not serious. The use of covariance analysis or any other technique 
which improves precision by reducing random error will, of course, increase 
the importance of real interaction and so the type of bias under discussion. 
Covariance, etc., will also increase the effect of bias resulting from hetero- 
geneity of variance of “real” interaction [see case (b) above]. 

Before concluding this section, two matters may be mentioned which 
are not irrelevant to the above discussion: 

(i) As several writers have pointed out [e.g., Lindquist (12, p. 98) ; Webb 
and Lemmon (19)] similarities between methods may also operate in an 
F-test to mask other significant methods differences present, [i.e., speaking 
more technically, such similarities reduce the power of the F-test, cf. John- 
son (9)]. Diamond (6) contends that the effect is normally small. It will be 
seen that, in the case of replicated methods experiment, both masking and 
case (c) bias may be present; it will also be seen that they are in opposition to 
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each other. Which predominates would depend on the relative importance 
of real interaction and random error. 

(ii). It is to be observed that the analysis of variance of repeated measure- 
ments for a group of individuals is similar in form to that for the replicated 
methods experiment. Individuals correspond to schools and the sets of measure- 


ments (or trials) correspond to methods. Also the main F-test is trials v. inter- ` 


action (individuals X trials) corresponding to the (M v. M X S)-test of the 
methods experiment. 

It follows that a somewhat similar di 
does not arise but cases (b) and (c) are 

It is very likely that the bias ari 
between interaction effects can be mor 
analysis than in the other. Since th 
each other in time, this is bound t 


scussion of bias is involved. Case (a) 
applicable. 

sing from heterogeneity of correlation 
е serious in the repeated measurements 
е sets of measurements must succeed 
© result in greater correlation between 


186 (11) covers himself on this point 
the analysis depends on the assumptions 
€ linear and parallel and that deviations 
mally distributed and of equal variance 
nder's tests (1) as superior in that they 
vidual differences in regression. Certainly 
‹ 1 ble to reveal any heterogeneity of individual 
regression which may be present. But Lindquist does not make the obvious 
point that, as a result of this heterogeneity, Alexander can only apply his 
up he was considering and not for the larger 
ist was concerned. There would appear to be 
J, г 
study of trend: for the group o impri i ae Uo udo a e 


sly biased. 
2. Proportionate Numbers in Subclasses (n,a 


It is generally accepted that di 
of variance arise only wi i 


mns—whatever they represent— 
odel I falls into this category). 
t—common in education—where 
plies to a larger population (i.e., 
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interaction variance has other components of variance besides that due to 
the variance within subclasses, proportionate numbers in the subclasses 
will introduce bias into the F-test of treatments against interaction. 

This type of bias would appear to have been discovered first by Smith 
(15), who gives the results of a variance component analysis of Eisenhart's 
Model II with proportionate numbers in the subclasses. Concerned as we 
are here with experimental designs common in education, it will be more 
instructive if we consider the results for the special mixed model which 
underlies the methods experiment replicated over a number of schools. 

The model is 


т=1,...,Р 
Хи = B+ А, + Be tee ЗЕ eu 8 = Levees В (10) 
{= 1, т 


where the symbols have the same meaning as in subsection 1, but now, with 
proportionate numbers in the subclasses, n,, can be written as Na,b, , where 
N is the total number of cases and the a’s and b’s represent the proportions 
of cases corresponding to columns (methods) and rows (schools), respectively, 
(3.8, o 1 Dee à 

Since we have already dealt with the problem of heterogeneity of vari- 
ance and correlation in the previous subsection, we shall assume homogeneity 
of variance and correlation for the present version of our model. The results 
of a variance component analysis are then as shown in Table 2. 

Applying the null hypothesis, namely, 


д. = Ар (lel. 00), (11) 

we obtain for the B-ratio of the (M v. M X S)-test the expression 
ка — La NE bS + (р — Dee 

ЧО E a — EHS + (р — Dg — Dow’ 


where 


(12) 


S* = OU — р) + ec]. (13) 


Subtracting the denominator from the numerator of this expression we 
obtain the quantity 


ма = Dasa - 0 226^ — 2501 
-Nü- Xa)8[270,— 9] (u = To ‚Ф, 


(14) 


which is positive except for the case in which the b’s are all equal (when it 
becomes zero). 
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An immediate conclusion to be drawn is that inequalities among the 
b proportions (i.e., the proportions for schools) introduce bias into the F-test. 
Also from the fact that the B-ratio is greater than unity, it is likely that the 
bias is positive (special cases confirm this). 

Having discovered this bias, we must ask: how does it arise? It is obviously 
due to the fact that the use of unequal proportions of pupils results in unequal 
weighting of the 7- and £-terms when, of course, they should receive the same 
weighting. Also, for the same reason, inequalities among the a’s must also 
produce bias although no indication of this is given by a consideration of the 
B-ratio alone. (It will be seen in fact that this effect is equivalent to that of 
heterogeneity of variance within methods). 

How serious may the bias be? It will first be noted that, unlike the bias 
discussed in case (c) of subsection 1, the type of bias with which we are 
concerned here involves both the 7- and £-terms which, together, are seldom 
negligible relative to the sampling error term (their relative effect will norm- 
ally be increased by the use of covariance or a similar technique). Therefore, 
with large inequalities, the bias may be far from negligible. 

Ап indication of the magnitude of the bias for unequal b proportions is 
the amount the B-ratio exceeds unity. This quantity can always be estimated 
for any practical case. To illustrate we shall use the data given by Lindquist 
in one of his examples (12, p. 120 et seq.). 


N = 440 p=4 q=5 
а = а = а = а = $ 


10 30 _ 19 _ 35 13 
= пб hio 5*7 = = 


The analysis of variance reads: 


b, 


d. f. Sum of Squares Variance 
Methods 3 988.6 329.5 
Schools 4 1748.3 437.1 
Methods X Schools 12 172.8 14.4 
Within Classes 420 2981.5 7.1 


The entries in the last column may be taken as estimates of the corresponding 
expressions in the last column of Table 1. Thus, by simple arithmetic we 
obtain the following estimates: 


с2 = 7.1; S = [oP о) + об] = 352; 
cè + TES. а— 3 a( 2, 02)8 = 16.6 
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i t 

Note that for the given values of the b’s, the value of the B-ratio canno 
idend 1.3, the value of (y — 1) È, b/a = Èr bj. I woe 
i However, the deviation of the B-ratio from unity is not in itse г ie 

easure of the magnitude of bias. Allowance must be made for the ыш. 
> degrees of freedom involved in the F-test. The greater the sce 
[e of freedom, the more important a given deviation becomes and v 

ee LH H H ee є, 
ки is now tentatively suggested that in all applications of the B-ratio 
technique, the magnitude of bias is best measured by the expression 


ДВ +з 1| (15) 
F 1% — F 5% 
where Fis and Fs% represent the values of F at the 1 per cent and 5 per cent 
umbers of degrees of freedom. It might 
any given design or model, how great 
before the bias becomes serious. 
sed in this subsection, the bias due to 


is (i.e., for the given numerical 
. example, against 16.6 instead of 14.4), [ 
and Cochran (3).] 


3. Unequal (Disproportionate) Numbers in Subclasses 


The literature on exact procedures for analyzing data of this type is 
now considerable, Tsao’ 
prehensive, However, th 


esearch have theref 


ore favored approximate 
al numbers in the s 


ubclasses—at least where 


(а) where the hypothesis tested 


columns of the data (Eisenhart's Model 1); 
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(b) where a general hypothesis is tested, applying to a total population of 
rows or columns (Eisenhart's Mixed Model, etc.). 


Case (a) 


In this type of analysis, as was stated earlier in the paper, the two 
main F-tests are interaction v. within subclasses, and, when this test is not 
significant, columns (or rows) v. within subclasses. [Tsao (18) deals with other 
possible tests.] Before we proceed to derive the corresponding B-ratios, it 
is to be noted: 

(7) Since Snedecor’s method employs proportionate frequencies, the 
interaction term will not contain any component due to main effects (the 
characteristic of orthogonality), i.e., the interaction term is independent of 
the values of the main effects. Also the variance for columns will be inde- 
pendent of the main effects for rows and vice versa. 

(ii) In deriving the first of the two B-ratios, we assume interaction to 
be zero; and in the case of the second, we not only make this assumption 
but we also assume zero differences between the main effects involved in 
the F-test. 

It follows from (7) and (Z?) that no serious loss of generality will be 
incurred (and a considerable saving in algebraic labor will be gained) if we 
straightway assume that interaction and the differences between main effects 
are zero; i.e., each observation, apart from a constant which we will here 
take to be zero, will consist only of sampling error and may be represented by 


r=1,-+:,p 
rn iOS E 5 (16) 
£51, uf 


where p denotes the number of columns, q denotes the number of rows, 
and n,, denotes the number of observations in the subclass (r, s). 

It will be assumed that the es in all subclasses may be regarded as 
random samples from an infinite population of e's of zero mean and variance 
cè (the usual assumption of homogeneity of variance). Thus, cè isthe E. V. 
of the variance within subclasses. 


Also let 
Na, = Ут, 
Nb, = om (r21,--,p5s521,--,9- (17) 
Ne, ^n, 
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Now let us derive the E. V.’s of the different sums of squares айыны 
їп Snedecor’s Method. The sum of squares between subclasses is given by 
nre 2 1 1 me Jr (18) 
1 S 
E Ewell Sen) -0 а ¥..)], 


which has E. V. 
2 2 
oe зу, з Ue 
У "» Nab; n, =N > У бер, Nes (19) 
= У DÊ (а, — a?0. 
The sum of squares between columns is given by 
tre 2 Bs Tira 2 
> Na = 2 3 А xx х= y ке) ‚ (20) 
which has E. V. 
“ с> а,(1 — ab, 
È Na, > = – ү b» Yan te ый x lA n. (21) 


Similarly the E. V. of the sum of Squares between rows is 


"E = E b.(1 DLE (22) 


By subtracting the sum of 


(21) and (22) from (19), we obtain the E. V. 
of the interaction sum of Square 


S 


oe У; pe = 20 = b) 


(23) 

It follows that the B-ratio for the F-test interaction v. within subclasses is 
PE тел ab.(1 а) — b). 24) 

; 9 سے 7 بے بے 5-5 م 


rs 


What values will this expression normally have? 
It will first be noted that, when c,, 


2 > ай—в)(1-ьЬ) 


= a,b, , the B-ratio is unity since 
=(@- 1)(0— 1). (23) 
This is, of course, to be expected since we are then dealing with proportionate 
numbers in the subclasses. 


„0, 
since for a given difference between c,, and 


t 
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a,b, , the expression will exceed unity by a greater amount when c,, < a,b, 
than it will be exceeded by unity when с, < a,b, . (ii) The B-ratio, in any 
particular case, may be regarded as a weighted mean of the pq values of the 
expression a,b,/c,, for that case; it, too, will tend to have a value greater 
than unity. | 

It is, therefore, suggested here that the use of Snedecor’s Method will 
normally produce a positively-biased test of interaction. The amount of 
bias will be indicated by the deviation of the B-ratio from unity or, better, 
by the value of the expression suggested at the end of subsection 2. 

The B-ratio for the columns v. within subclasses F-test is 


1 a,b, — ab. 
god x > Cre : dn 
It will be apparent that exactly the same can be said about this ratio as for 
the other. 

Before we finish with case (a), it may be of interest to examine Tsao’s 
modification of Snedecor’s Method (18). Tsao “questions the validity of 
retaining the within variance derived from the original data while the other 
variances are derived from the adjusted data.” To judge from the simplified 
case with which he deals at the end of his article, he would adjust the sum of 
squares within subclasses to the value 


ж Ded, E(u =, ee) em 


Nes 
which will have E. V. 
у Een Dahn- 5). — e 
That is, the Е. V. of his adjusted variance within subclasses is 


у= (к- zr а) 5 (29) 


and not o as for Snedecor’s variance within subclasses. 

If we are agreed that a,b, /c., Will on the average be greater than unity, 
it follows that the above E. V. will normally be less than c^ . It would appear 
therefore that Tsao's correction will on the average increase the bias of 


Snedecor's Method. 


Case (b) 

It will clarify the discussion if we think of the columns as methods and 
the rows as schools. Our problem then is to investigate the bias of the 
(М v. M X S)-test when Snedecor's approximate method is applied. А 

Obviously a part of the bias produced will be of the type discussed in 
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subsection 2 (provided, of course, there is either real interaction or a 
error). The rest of the bias will be of the type discussed in case Wo in 
In order to study the importance of the latter type of BUB Tp be 
(M v. M X S)-test, let us take the special case where there is no ie "е 
action and error consists only of sampling error. Then, from the analys 
case (a), it will be seen that the B-ratio for the (Mv. M x S)-test is 


= — b) 
1 a,b,(1 — a,)b, T &,b.(1— 01 - b.) 
p—1 > У Cra jg S2 GT x 2 Cre (86) 


nator may be regarded as weighted means 


re > it follows that the B-ratio will vary aie 
unity, the degree of variation diminishing with the increase in number о 


the rows and columns. Therefore, as far as this type of bias is concerned, it 
is likely that Snedecor’s Method will generally provide a more valid F-test 
for case (b) than for case (a). 5 

The complete B-ratio for Snedecor’s 
types of bias are involved, can be written 
(cf. previous subsection). It is 


Since both numerator and denomir 
of the same ра values of a,b, /c, 


(М v. M X S)-test, when both 
down without further caleulation 


O ossis: аа) 
" 2 Jr 
NE 2, a, )(1— bF be 


а 


I~p)+o;"]+o2 У Deea —a,)(1—b,) 


In examining the bias j 
has been made of the 
established this crit 


nereased by Snedecor’s Method, no mention 
x? criterion for the applicability of the method. Snedecor 
erion by empirical methods, Obviously the B-ratio, or 


rather some such expression as (15), could be established empirically as an 
alternative criterion. In dealing with 


ec n the type of analysis discussed under 
case (b), it is possible that this alternative might prove superior. 
№. Summary of Results 
The basic model is 


r= i, ‚р 
аА ИИИ ВИ, 
t 1, = s Nys 
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1. Equal Numbers in Subclasses (n,, = n) 


Three possible sources of F-test bias were considered: 

(a) Heterogeneity of variance within subclasses (from row to row). There 
is no evidence of bias in this case. The same applies to heterogeneity of 
variance of the £-effects from row to row. 

(b) Heterogeneity of variance within columns. No bias arises when only 
two columns are involved. When there are more than two columns, the bias 
is likely to be positive (for definition of positive and negative bias see p. 229). 

(c) Heterogeneity of correlation between B-effects (of one column with 
another). The bias in this case is positive. In the typical methods experiment 
replicated in a number of schools it is unlikely to be serious; but for an analysis 
of variance of repeated measurements the bias involved might be considerable. 


2. Proportionate Numbers in Subclasses (n,, = Na,b,) 


For the given type of model (also for Hisenhart’s Model II and Mixed 
Model), proportionate numbers in the subclasses produce bias, again of a 
positive character. The amount of bias depends on the degree of inequality 
among the a and b proportions; also on the magnitude of the q- and £-vari- 
ances relative to the e-variance. Gross inequalities in the proportions are 
obviously to be avoided in setting up experiments. A formula, of general 
application, is suggested for measuring the magnitude of bias. 


3. Disproportionate Numbers in Subclasses 


In this case, F-test bias was studied for Snedecor's Method of Expected 
Proportionate Frequencies. Eisenhart's Model I was considered as well as 
the mixed model stated above. 

For Eisenhart's Model I, bias, if present, will normally be positive. 
When Tsao’s modification of Snedecor's method is applied, it would appear 
that the bias will on the average be increased. 

In the ease of the Mixed Model, part of the bias arises in the same 
way as for Model I, but it is likely that, in general, it will not have the same 
importance. The other part of the bias is of the same nature as that discussed 
in section 2. 

It is suggested that, for the Mixed Model, the expression for measuring 
bias proposed at the end of section 2, might prove superior to the x -criterion 
as a test of the applicability of Snedecor's Method. 
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LEAST SQUARES ESTIMATES 
AND OPTIMAL CLASSIFICATION 


Husert E. BRoGDEN 


PERSONNEL RESEARCH BRANCH 
THE ADJUTANT GENERAL'S OFFICE 
DEPARTMENT OF THE ARMY* 


. Asimple algebraic development is given showing that. criterion estimates 
derived by usual multiple regression procedures are optimal for personnel 
classification. It is also shown that, for any assignment of men to jobs, the 
sum of the multiple regression criterion estimates will equal the sum of the 
actual criterion scores. 


In earlier papers (1, 2), the author contended that estimates of job 
proficiency derived by least squares estimates will place men in jobs in the 
most efficient way possible with the given predictor battery available, and 
that the average estimated job profieieney obtained by the use of such 
least squares estimates will equal the average actual job proficiency of 
assigned personnel. This paper will seek to establish these two points in a 


more rigorous fashion. 


Definition of Symbols 


С;; = the performance of individual i in job j. 

б; = estimates of the C'i; , each derived by regression equations from the same 
battery of tests and the same universe of individuals. It is assumed that 
the zero- and higher-order regressions involving the tests and the Ci; are 
linear. t 

б; = the average С:; value for a subset of individuals having the same pattern 
of scores on the battery of tests. 

X = an allocation matrix with elements, 2;; , taking on values of zero and one. 
The z;; entries for any individual have a single entry of one, and the Zi; 
entries for job j have Q; entries of one. The remaining entries are zeros. 
The arrangement of ones in X corresponds to the placement of men in 
jobs. The use of X to symbolize any possible allocation of men to jobs is 
convenient and facilitates algebraie manipulation. In computing an 
allocation sum (to be defined), the cross-products of C ;; and zs; are summed. 
When z;; is one, the corresponding C;; is included in the sum; when Zij 


*The opinions expressed are those of the author and are not to be construed as 


reflecting official Department of the Army policy. pe . к 
Но practice, the Cy; would obviously not be available for all individuals in each job. 


Regression equations applying to the same universe can be estimated through a series of 
validation studies with a separate study being necessary for each job. In actual use the 
Ĉi; could then be computed for each applicant in each job. 
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is zero the correspofiding C ;; is excluded. Thus, X represents any arrange- 
ment of zeros and ones, good or poor, consistent with the limitations 
already imposed, except that such an arrangement must be based solely 
upon the scores on the battery of classification tests. In other words, 
X represents any allocation of men to jobs consistent with the conditions 
of the problem. ical 

K; = a set of constants, one for each job. The К j's are assumed to have numerica 
values such that, with allocation of each individual to the job in which 
(бә + Kj) is highest, the number allocated to each job will correspond 
to the number'specified by the quota for that job. 

X' — a partieular X, with the z}; for each individual taking on a value of one 


for the job in which (Ci; + Kj) is highest. X’ otherwise conforms to limita- 
tions imposed on X. 


2-i. Сити = the allocation sum. From the definiti 
that the allocation sum is equivalent 


of the C;;’s for the job to which e 
matrix. 


the quota for job 3j. 


оп of an allocation matrix, it is evident 
to a simple sum, across all individuals, 
ach is assigned by a given allocation 


Q; 


The Proof 
We seek to demonstrate that 


Consider a subset of 
on the battery of tests b 


and vi are to be based he test, scores, it follows that both will 
remain constant in i 


for such a subset, 
Э Gy + Kitu = 20 ON бат + b» Kiza) (1) 
= У 0, + E Ran) @ 


individuals 
asic to the 


Similarly, it follows tat. 


» the criterion mean for the subgroup, 


че derived from a linear regression equation, will 
approach zero, Consequently, 


‹ it is also true that, for the subset i Си, 
the sum of the criterion Scores, approaches equality tod Gg iz 
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The basis for the equivalence of C,; and Су; within a subset having an 
identical pattern of scores might also be stated as follows: It is a basic principle 
of least squares prediction that the mean is the point at which the sum of 
the squares of the deviations is minimal. C;; , hence, is the best least squares 
estimate of the criterion scores of individuals with an identical pattern of 
test scores. If the regression system is linear, С.; also provides the best 
least squares estimate. Hence, as N approaches infinity, the two must coincide. 

From our definition of X’, we know that, for such a subset 


E (Ĉu + Кш, > 2204 + Кд. (3) 
From equations 2 and 3, 
> (at 27 би + È Ка) > x (s 22 Cut У Ка). ©) 
Substituting У), C;; for У), бү; , we obtain 
Les D Cu + 0 Ка) > Ve Vit 3 Кин). ©) 


We may also write 


25 (>; биа 3E 25 К!) 


23 o» Cutts + У Куд) 
25 o» Ciz; + 2 Кү). (7) 


IV 


Since (7) holds for any subset, it holds in summing over all individuals. 

In summing over individuals within any job, K; is a constant and may 
be factored out. Both >>; 4; and У), =;; are, from the definition of X’ and 
X, equal to Q; . Hence, we have 


XO Cast; + EQ) 


25 o» С; + КО) 
D» (25 Cuat; + К) (8) 


IV 


or 
РЭ, О + № K;Q; = 2 Citi; 3E 27 K;Q; > 2 С.а: + 2D K;Q; (9) 


and, consequently, 


25 бм, = 25 Cx; > 25 Cz . (10) 


We have, then, established two generalizations. First, we have shown 
that, as N approaches infinity, the predicted criteria for a set of jobs derived 
by the use of linear multiple regression equations yields, upon assignment of 
men to jobs, an allocation sum that is equal to or higher than that obtained 
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by any other assignment of individuals to jobs that is based on the test 
scores. Second, we have shown that, for any giyen assignment of men to 
jobs, the allocation sum obtained when regression estimates of the criterion 
are used becomes, as N approaches infinity, identical with that obtained 
when the criterion scores themselves are used. 
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AN IMPROVED METHOD FOR TETRACHORIC r 


W. L. JENKINS 
LEHIGH UNIVERSITY 


From the ratio of the cross-products of a fourfold table, with the appli- 
cation of two tabled corrections, tetrachoric r’s can be estimated with a mean 
discrepancy of less than .005 even when splits vary greatly from the medians. 
The necessary calculations can be handled by slide rule and the correction 
tables used without interpolation. 


Davidoff and Goheen (1) have recently published a table for estimating 
tetrachoric 7’s directly from the ratio of the cross-products of a fourfold 
table without correction. Unfortunately, the method gives accurate answers 
only when both distributions are split at approximately their medians. 
When the splits are not close to the medians, the obtained r’s are always 
biased in the positive direction. With some extreme splits, the positive 


bias amounts to .10, .15, or more. 
However, it is possible to correct the obtained tetrachoric r’s by a method 


which is described and explained below. 


Method and Example 
1. Letter the fourfold table so that а is smaller than d and ad is greater 
than be. 
(c) 43 | (d) 612 


(а) 32 (b) 39 


2. Compute the cross-products ratio ad/be. 
(32 X 612)/(43 X 39) = 11.68 
From Table 1 find the uncorrected tetrachoric r for the nearest value of the 
cross-products ratio. 
For 11.60, uncorrected r = .756. 
3. Compute the two marginal splits (а + b)/total and (a + c)/total. 
32 +30  ,, 32443 _ 
ae = ар 010. 
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TABLE 2 


Base Correction 


= 
Large Smaller Split 
Split 
10 11 12 13 14 15 16 17 18 19 20 22 24 26 28 30 32 34 36 38 40 42 44 45 
80 225 217 210 204 197 190 184 178 173 168 163 
78 |217 209 204 196 190 184 177 171 166 160 156 147 
76 |208 201 195 189 182 176 169 163 158 152 148 139 132 
74 |201 194 187 180 174 168 162 156 150 143 140 131 124 116 
72 |195 187 180 173 166 160 154 148 142 137 132 123 116 109 101 
70 |188 100 174 166 160 154 148 142 136 131 126 117 108 100 092 08: 
68 |182 174 168 160 154 148 142 136 130 124 120 109 100 091 084 076 070 
66 175 168 162 155 148 142 136 130 124 118 113 103 093 084 076 069 062 056 
64 |169 162 155 148 141 134 128 122 116 111 105 096 086 077 070 063 056 050 045 
62 163 155 148 141 134 128 122 116 110 105 100 089 080 072 064 057 050 045 040 035 
60 157 150 142 135 129 122 116 110 104 099 094 083 074 066 058 052 045 040 035 030 025 
58 |151 144 136 129 122 116 110 104 098 093 087 077 068 060 053 046 040 035 030 025 020 016 
56 |145 137 130 122 116 110 104 098 092 086 081 072 063 055 048 041 036 030 025 020 016 013 010 
54 |139 131 123 116 110 104 097 091 086 081 076 067 058 050 044 037 032 026 021 016 012 008 005 002 
52 |134 126 119 111 105 098 092 087 081 076 072 062 054 046 040 033 027 022 017 012 008 004 000 000 
so |129 121 114 106 100 094 088 082 076 071 066 058 050 042 030 029 023 018 013 010 006 000 000 000 
48 |124 116 108 101 095 088 082 076 071 066 062 054 043 038 032 026 020 015 010 008 004 000 000 000 
46 |110 111 103 096 089 082 076 071 066 062 057 050 041 036 028 022 017 013 009 007 003 000 000 000 
44 114 106 098 091 084 078 072 067 062 057 053 046 038 031 026 020 016 012 008 006 002 000 000 
42 |110 102 094 086 080 073 068 062 058 053 049 042 035 029 023 018 014 012 008 006 001 000 
40 |105 097 090 082 076 069 064 059 054 050 046 039 032 026 021 016 013 011 007 005 000 
зв |101 093 086 078 072 066 060 055 051 047 043 036 030 024 020 016`012 011 007 004 
36 |098 090 082 075 069 063 058 053 049 045 041 034 029 023 019 016 012 010 007 
34 [095 087 078 O71 066 060 056 051 047 043 039 033 026 023 018 015 011 010 
32 |091 183 075 168 063 058 054 049 045 042 037 032 026 023 018 015 O11 
зо |088 080 072 065 061 056 052 048 043 040 036 031 026 022 018 015 
28 |057 079 070 063 058 055 052 048 043 040 036 031 026 022 018 
26 |086 078 069 063 058 055 051 048 043 040 036 031 026 022 
24 |085 077 068 063 058 054 051 048 043 040 036 031 026 
22 |085 077 068 063 058 054 051 048 043 040 036 031 
20 |085 076 068 063 058 054 051 048 043 040 036 
18 |086 076 068 063 058 054 051 048 043 
16 |089 079 070 064 059 055 053 
14 |092 082 072 066 061 
12 |097 087 075 
10 |103 
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5. Multiply the base correction by the multiplier to secure the final 
correction. 


103 X .90 = .093 


6. Subtract the final correction from the uncorrected т to secure the 
corrected tetrachoric г. 


756 — .093 = .663 


Explanation 


Tables 1, 2, and 3 are derived from Pearson’s tables of normal correlation 
surfaces (2). For Table 1, cross-product ratios for median splits were com- 
puted for r's of .05, .10, .15, --- , .95, and a curve constructed relating 7 to 
the cross-product ratios. The figures given in Table 1 are scaled from this 
curve. 

Securing Tables 2 and 3 required a number of replottings of the Pearson 
data. Pearson’s tables are set up in 0.1с steps; decimal steps of marginal 
proportions are needed. Accordingly, it was necessary to pick values that 
corresponded roughly to the desired marginal splits at various levels of 7 
and obtain cross-product ratios. These were plotted and replotted until a 
family of curves was obtained that related the needed corrections to three 
variables: the two marginal splits and the uncorrected tetrachoric r. 

Table 2 is scaled from the family of curves according to steps of the 
two marginal splits, but for a single value of uncorrected tetrachorie г (.70). 
Except for such inaccuracies as may be introduced through repeated replot- 
tings, these corrections are precise when the uncorrected tetrachorie r is .70. 

To avoid having a book of such tables (one for each step of uncorrected 
tetrachorie r), it was necessary to resort to some approximations. When 
both splits are small (below .40) the correction depends chiefly on the diff- 
erence between the splits and the uncorrected tetrachorie r. When either 
split is large (above .40), the size of the smaller split (rather than their diff- 
erence) has the greater influence. Table 3 is set up accordingly, presenting 
multipliers to be applied to the base corrections of Table 2. 


Empirical check 


The adequacy of the method is shown by the results of an empirical 
check involving the recomputation of 500 r’s taken from the Pearson tables. 
Table 4 shows at the top the discrepancies of the uncorrected r’s (all positively 
biased) such as would be obtained if Table 1 were used without correction. 
At the bottom are shown the residual discrepancies after the corrections of 
Table 2 and Table 3 have been applied. Even without interpolation, 88 per 
cent of the residual discrepancies are less than .005. With interpolation this 


rises to 94 per cent. 
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TABLE 4 
Empirical Check on the Adequacy of the Correction Method 


Discrepancies BEFORE correction (all positive) 


.000 .021 .041 -061 «081 „101 .121 „141 2161 


ku 00 о oeo oeo № № ә = 
-10 30 15 1 

.20 2 19 11 3 

.30 16 16 и 5 4 2 

40 и 12 и 10 6 3 2 1 
.50 10 12 9 4 9 5 2 2 3 
.60 9 11 » 6 т 4 4 3 5 
70 9 12 11 7 7 3 2 1 4 
80 9 i 6 6 3 4 4 3 

.85 12 8 1 6 6 

.90 12 9 3 


Discrepencies AFTER correction 


V Without interpolation With interpolation 

= 9 7 € 
r less .005 less ‚005 
ло 55 1 56 0 
20 52 4 54 2 
.30 49 5 52 2 
E 51 5 51 5 
+50 49 1 52 4 
+60 50 6 53 3 
ло 48 8 54 2 
.80 42 5 42 5 
.85 28 11 34 5 
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BOOK REVIEWS 


BENJAMIN Евоснтев. Introduction to Factor Analysis. New York: Van Nostrand, 1954. 
pp. xii + 280. $5.00. 


Several good books are already available in factor analysis. What claim can be 
made for another? Fruchter answers this in his preface. “These treatments have been 
found difficult by many otherwise competent students because of the mathematics and 
notation involved. It is hoped that this book will serve as an introduction to the subject 
and as a steppingstone to these more advanced texts.” 

The first four chapters provide a logical and mathematical introduction to factor 
analysis. Spearman’s two-factor theory and its generalization to Holzinger’s bi-factor 
method are discussed first. Cluster analysis is then considered as a means for understanding 
the logie of factor analysis. Next comes a chapter of “mathematics essential for factor 
analysis," including the basic matrix algebra operations and the geometry of rotation. 
This is followed by a chapter in which the basic equations of factor analysis are developed. 

The next four chapters present the principal computational procedures. The diagonal 
and centroid methods are given in one chapter; the multiple-group and principal-axes 
methods in another; orthogonal rotation in a third; and oblique rotation in a fourth. 

The final three chapters discuss (a) the interpretation of factors, (b) various applica- 
tions of factor analysis, (c) some of the controversial issues in contemporary factor analysis. 
"The book concludes with a useful bibliography of 700 titles covering principally the period 
from 1940 (the year of Dael Wolfle's review) to 1952. 

Fruchter’s statement of factor analysis differs in two main ways from the books 
already familiar to the readers of Psychometrika. First, his account is briefer and probably 
simpler than that of any of his predecessors; secondly, it has a better claim to be a textbook, 
less claim to be a personal statement. 

Fruchter is undoubtedly right in saying that many otherwise competent students 
find factor analysis difficult because of the mathematics and notation. For many years to 
come, statements of factor analysis will be needed in which the approach is by means of 
the logie and calculations rather than by any rigorous mathematical development. 

Fruchter stresses (a) the practical applications of factor analysis, and (b) the com- 
putations. Ten examples of the use of factor analysis are given in the chapter entitled 
* Applications in the Literature." These are of an interesting diversity, ranging from investi- 
gations of conditioned responses and rat maze learning to prepsychotic personality traits 
and Supreme Court voting records. Q- and P-technique are represented as well as R-tech- 
nique. The chapter should be useful in reminding the psychological student that in studying 
factor analysis he must remain a psychologist. The computations in factor analysis are 
presented in detail in chapters 5 through 8. The various steps are itemized, and the instruc- 
tions are for the most part clear and straightforward, so that the student who works 
diligently through the presentation should be able to calculate a factor analysis in a re- 
search of his own. The more experienced factor-analyst will probably be glad to have these 
step-by-step descriptions both for his own reference and for supplying to the student 
who seeks his aid. 

A price is paid, naturally enough, for this emphasis upon learning by doing. For the 
most part the controversial issues of theory are eschewed. Key concepts are frequently 
introduced with so little discussion that the student may have trouble in seeing why the 
factor-analyst has adopted the particular procedure. For example, the account of com- 
munalities is brief and in my view very unsatisfying. The use of communalities is probably 
the factor-analytie procedure which has been most criticized by statisticians. The student 
whose knowledge is derived from this book will hardly be able to reply to any criticism. 
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The distinction between common and specific variance is initially made (p. 45) without 
any mathematical or logical reason being supplied for its adoption, and the brief discus- 
sions on pp. 46-47 and pp. 51-52 might well serve to confuse rather than clarify issues 
for the student. For one thing, Fruchter points out that communalities enable one to repro- 
duce the correlations, and unities enable one to reproduce the original test scores; how- 


The discussion of orthogonal an 
distinction between simple axes and primary axes (i.e. 
is deferred until the final chapter, which is а pot-p 
earlier. Yet it is doubtful whether the student will get any real understanding of the tech- 
niques of oblique rotation presented in an earlier chapter without knowledge of this dis- 
tinction. Secondly, the controversy between those who fa 
favor oblique rotation is also held over to the final chapte: 
both sides are summarized very briefly, with Fruchter m 
upon the issues. 


r. Even then the arguments for 
aking no attempt to adjudicate 


it has been a personal document as 
atement of his original contributions 
sed views, except sometimes by way 
е : hat factor-analysts are by no means 
‹ | о Procedures and enters into logical and mathematical contro- 
versies with zest. Likewise, ; in Cattell, and in Holzinger and Harman space 
is found for personal contributions and points of view. 
is ending in factor analysis. Fruchter’s book 
haracteristic of factor analysis in the thirties and the 
the old disputes are settled. While the logi 
: ’s recent articles), the degree of 
ctors a i i "sica 
than a scientific issue, Ppears increasingly to be a metaphysical rather 
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significance work?" or "Do the present tests 


?" does not get answers from Fruchter's text. 
e answers to all of these, but at least they 


University of Illinois 
Charles Wrigley 


" " < = 


BOOK REVIEWS 261 


ANNE Anastast. Psychological Testing. New York: Macmillan, xiii + 682, 1954. $6.75. 


The reviewer of a textbook serves essentially three functions. He attempts first to 
evaluate the soundness of the work from the point of view of accuracy, fundamental sound- 
ness, and good judgment in those areas where opinion rather than demonstrable knowledge 
is involved. Secondly, the reviewer must consider the book from the point of view of the 
audience for which it is intended and indicate whether he thinks it is suitable for the 
purpose stated. Finally, he must evaluate the book from the point of view of its original 
contribution to the total body of knowledge in the area covered. 

Concerning the soundness of the book the reviewer finds remarkably little with 
which to take exception. The point of view presented is conservative and scholarly. With a 
few minor exceptions, the material seems to be accurate and precise. Where individual 
judgment and evaluation enter the picture, these judgments are on the whole conservative 
and, while pointing out weaknesses, tend for the most part to be favorable toward tests 
and test authors. While it is, perhaps, no truer in the field of testing than in other fields, 
it certainly can be said that the construction and publication of a test is an exercise in 
compromise between what is theoretically right and desirable on the one hand and what is 
practical and feasible in terms of time and expense on the other. While the author of this 
book is fully aware of the need for improvement and makes many suggestions as to how 
this may be brought about, there is nothing in the book to discourage the potential author 
from undertaking the construction of a new test or to discourage the test publisher from 
expanding his offerings. 

The development of the book seems logical. Chapter II, “Principal Characteristics 
of Psychological Tests,” lays the groundwork for what is to follow. Some psychologists 
might take exception to the definition of a psychological test as “essentially an objective 
and standardized measure of a sample of behavior” on the basis that this definition is too 
comprehensive, including as it does almost every possible variety of test. In the writer’s 
opinion, achievement tests could be handled better as a separate category rather than 
as an aspect of psychological testing. Problems of reliability, validity, standardization, 
etc., are substantially different for achievement tests than for psychological tests in many 
instances. The American Psychological Association Test Standards Committee recognized 
this fact in leaving to the American Educational Research Association the production of 
a code for achievement tests. 

Dr. Anastasi says that one can “consider all tests as behavior samples from which 
predictions regarding other behavior can be made. Different types of tests can then be 
characterized as variants of this basic pattern.” She indicates further that one needs to 
be cautious in talking about measures of capacity, since capacity cannot be directly 
measured but can only be inferred from a measure of behavior. With this point of view, 
the writer of this review is in hearty agreement but he feels that the text has not gone far 
enough in indicating that many of the measures described are useful only if they are sused 
to infer future behavior. This is certainly true of intelligence tests both of the general 
variety and the factor batteries and obviously true of prognostic and aptitude tests. 

In the writer’s opinion it may be considered one of the weaknesses of this text that 
insufficient attention is given to the basic problem of comparing such measures of capacity 
with subsequent measures of achievement. The problem of the criterion is discussed 
effectively but inadequate attention is paid to the problem of units in terms of which such 
pre- and post-measures can be compared. In fairness, it should be said that as much is done 
in this text as is generally done, perhaps more, in dealing with these problems. For example, 
a considerable section is devoted to expectancy charts, which is a noteworthy addition to 
what is ordinarily found in similar texts. 

The sections of the book which deal with various types of tests are particularly 
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well done. The selection of tests used for illustrative purposes seems to be representative 
and sufficient information is given to provide the reader with a good notion of the various 
types of tests. | 
With regard to the evaluation of the book from the point of view of the audience 
for which it is intended, the writer cannot speak with such complete single-mindedness. 
Dr. Anastasi defines the audience as “the general student of psychology” and says further 
“Today, familiarity with tests is required not only by those w 
but by the general psychologist as well.” 
that this textbook cannot be read intellige 
psychological testing without their having 
Even with such a prerequisite, 


rather than undergraduate classes and for students majoring in psychology rather than in 


Dr. Anastasi in her preface where she 
pposed by the present text ..." and 
miliarity with statistics, however, all 
xplained and illustrated. Such statistical 


ә 5 y should appear more meaningful to the beginner than 
they would if segregated into a special ‘statistical chapter.’ ” It appears to the writer that 


basic statistics. 

Dr. Anastasi indicates further + 
mber of fields, including the guid , school psychologist, psychometrist, 
personnel worker in business and industry and the clinical psychologist. With this point 
on. In fact, he would recommend the book as one 
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Lovis Leon THURSTONE 


Louis Leon Churstone 


With the death of L. L. Thurstone on September 29, 1955, psychology 
lost one of its greatest, a unique figure on the psychological scene and one to 
whom psychologists will always be indebted. If any psychologist of the past 
quarter century deserved to be called Mr. Psychological Measurement, it was 
he. His major professional objective coincided with that of the Psychometric 
Society and of Psychometrika, both of which were founded under his leadership: 
The development of psychology as a quantitative, rational science. By virtue 
of his own contributions and his influence on others, psychology has taken 
long steps in the direction of fulfillment of this objective. No major aspect of 
the field of measurement was untouched by him. 

Louis Leon Thurstone was born May 29, 1887, in Chicago, where in later 
years he spent the greater portion of his professional life and achieved his 
greatest distinction, at the University of Chicago. His parents were of native, 
Swedish stock, his father’s occupations being, in turn, military instructor, 
Lutheran pastor, editor, and publisher. Owing to a mobile family life, Thur- 
stone went to school in Illinois; Mississippi; Stockholm, Sweden; and James- 
town, New York. He attended Cornell University, where he specialized in 
engineering. Considering the few instances of which the writer has known in 
which psychologists have started from a base of engineering training, he has 
often thought that we should be better off if more psychologists had taken 
that educational route. 

It was during his engineering-school days that the problem of the learning 
curve, and hence psychology, caught Thurstone’s attention. On graduation, 
however, he was offered a position in the laboratory of Thomas A. Edison, 
where he spent the year of 1912. During the next two academic years, he 
taught engineering courses at the University of Minnesota, and there began 
his study of experimental psychology. Graduate work followed at the Uni- 
versity of Chicago. In 1915 he accepted an assistantship in the new and active 
laboratory established by Walter V. Bingham at the Carnegie Institute of 
Technology. He received his doctorate from Chicago, with a dissertation on 
the learning curve. His academic rise at Carnegie was something of a record. 
Beginning with the rank of instructor in 1917, with a promotion each year he 
became professor and head of the department by 1920. 

The year of 1923-24 was spent in Washington, D. C., with the Institute 
for Government Research, an agency devoted to the improvement of civil- 
service practices. From that time on, Thurstone had considerable influence, 
directly or indirectly, upon civil-service procedures. 

‘After his marriage in the summer of 1924 to Thelma Gwinn, Thurstone 
assumed his professorship at the University of Chicago. In the course of time, 
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he had much to do with initiating and setting the pattern for the University’s 
distinguished Board of Examinations. In 1938 he was honored with the 
appointment as Charles Е. Grey Distinguished Service Professor. In 1948 he 
was Visiting Professor at the University of Frankfurt, and in the spring 


1952, at which time he became Research Professor and Director of the Psy- 
chometric Laboratory at the 


professional affiliation at the time of his death. 
His unquestioned creative productivity can perhaps be attributed to 
certain traits that seem to stand out—his dissatisfaction with the status of 


, his originality was demonstrated relatively early. While 
ed а novel method for trisecting an angle 
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attention of the writer whe 


Portant variables in a way that 
attending the seminar then 
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did not found a school, nor did he deal in popular or spectacular subjects. He 
often spoke to “deaf ears” because so few psychologists were prepared by 
virtue of interest or mathematical preparation to listen. Perhaps his contribu- 
tions that have gained the widest notice are his attitude-scaling methods and 
attitude scales, and his discovery of primary mental abilities and his tests of 
them. Of the many quantitative methods that he has provided, probably that 
of multiple-factor analysis stands first by a wide margin in potential usefulness 
and impact upon psychology in general. 

It is not so well known that in the later years of his life his energies were 
very much devoted to performance tests of personality. This interest sprang 
in part from dissatisfactions with both projective tests and personality in- 
ventories. He was also working on material for a book that was to give a 
general treatment of psychological measurement and on revisions of his tests 
of primary mental abilities. It is hoped that many of these last efforts had gone 
far enough to reach publication. 

Thurstone was always quite willing to acknowledge the assistance from 
his wife, Thelma. They collaborated on many studies and for years were 
jointly responsible for the American Council on Education college-aptitude 
examination. Their three sons are grown and started on their various pro- 
fessional careers. Besides the immediate family, Thurstone leaves behind 
a great many loyal and capable former students, who f ollow in the Thurstone- 
Chicago tradition, as well as many admirers around the world. 


Universily of Southern California J. P. Guilford 
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PSYCHOMETRIC THEORY: GENERAL AND SPECIFIC* 
LEDYARD В TUCKER 


PRINCETON UNIVERSITY 
AND 
EDUCATIONAL TESTING SERVICE 


Fellow members of the Psychometric Society, and our colleagues, 
members of Division 5 of the American Psychological Association, I under- 
stand my mission tonight is to provide you with both humor and a mild 
message. Precedent has established an eminently high standard. Maybe I 
can explain the humbleness with which I approach this task by relating the 
account of an episode. This occurrence, like much illustrative data in Psycho- 
metrika, is fictitious, of course. 

During my trip from the splendorous East Coast to the beauteous West 
Coast, I took vacation time to tour part of the magnificent country in be- 
tween, as many of you, undoubtedly, did also. One afternoon I came upon a 
small tent village in a fairly remote mountain section. Not that such camps 
were so uncommon as to be remarkable in themselves; something about this 
particular camp caught my eye. In the center of this camp was а rustic 
conference table flanked on one side by a large blackboard mounted on cedar 
posts. This in itself was enough to raise a moderate interest, but my curiosity 
was intensely stimulated by another observation. Scribbled on this black- 
board were a number of familiar symbols. Here was a sigma, there was а 
derivative, another place was a box labelled “factor matrix." How could 
this be? Needless to say I stopped to investigate. To my extreme pleasure 
I found myself among friends. A small group of Psychometricians had set 
up a summer seminar. There was nothing for me to do but to stay for dinner. 

Before and during dinner we chatted about this and that. There were 
а number of requests for news such as: Were the Brooklyn Dodgers still run- 
ning away with the race in the National League? Had Senator McCarthy 
started a new investigation? What were the leading plays in the summer 
theaters? After these matters had become settled, we sat back quietly for а 
period. This silence was ended by one member, whom,'of course, I won't name, 
exclaiming: “Say, I've got one that should work. How about this? 751955. 
There was a short pause for ten seconds while the others seemed to stop 
te the proposition. Then there was loud and long laughter. 


and contempla 
1955. 


*Presidential Address to the Psychometric Society, September 5, 
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« vould 
In the enthusiasm of the moment, another member called out ае - 
be like 683291." Again there was laughter. Tajer APATA онар е 
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Reo К oe to become a bit boring after the first week af ven m 
ме They had heard each other’s jokes pm те 
tired of hiking and fishing. As an extra-curricular activity P et ssi 
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” 
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at I had seen going on. It was a life-saver 
would not have been able to put up with 
nd week. My weakness was that I had not 
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5 not humorous. Worst of all I had picked 
st classification on the built-in criterion. I had told them that this 


With the field of humor now having been thoroughly studied, we may 


turn our attentions to other areas, For a Short period T will direct my attention 
to material related to the title of this 


address, Psychometric Theory: General 
and Specific. Here Tam i 


vations of psychological phenomena can 
be validly matched with ex ions obtained from rati ries. 


are several aspects of my кен 
"vations I mean the records obtaine 
ourse of phenomena or of the observation 


observations are subject to operational definitions. By 
expectations I mean Statements describing the expected nature of observa- 
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tions. Such statements might vary on a continuum between vagueness and 
definiteness. I wish to consider definite statements as constituting definite 
expectations. Any vagueness could be considered as the introduction of 
approximation in the expectation. T hus, we might have approximate expecta- 
tions. 

I might illustrate by reference to color matching. The theory of the 
color pyramid would indicate that a given combination of color disks in 
particular proportions on one color wheel would be matched by another 
group of color disks on another wheel in given proportions. When the colors 
of these disks and the various proportions are specified, a definite expectation 
is produced. A corresponding observation could be obtained in the laboratory. 
A subject would be shown the color wheel with the standard combination 
of disks. The second wheel could be supplied with the other set of disks and 
the subject be asked to adjust the proportions until the color produced 
matched the color of the first wheel. Our observation would contain the 
specifications of the situation and a record of the proportions established by 
the subject. 

Т have avoided saying that observations were to be represented by 
rational theories. To make this latter statement would permit use of theories 
with infinite degrees of freedom so that these theories could be adjusted to 
represent any collection of observations, past, present and future. These 
theories would have no power, however, in providing definite statements of 
expectations concerning observations other than those observations the 
theories were adjusted to represent. In contrast, I have chosen to emphasize 
the correspondence between theoretically derived expectations and observa- 
tions. The power of a theory is directly related to the number of definite 
expectations it produces and inversely to the errors between these expectations 
and the observations. 

Let us consider the role of general psychometric theories in maximizing 
the extent to which derived expectations correspond to observations. Two 
aspeets of this maximization appear: First, there may be a maximum Cor 
respondence, that is, minimum errors between the expectations and the 
observations. Second, the theory may yield a broad range of expectations. 
This is the place where general theories hold promise. We will all, undoubtedly, 
agree that it is desirable to have a systematic theory which will provide 
valid expectations for a large number of kinds of observations for many 
phenomena. There is a necessity, though, that we know how to use this 
theory to arrive at these expectations. Note my boner at the summer seminar 
previously described. As a further illustration, consider an advertisement 
that appeared a while ago in one of the weekly Princeton papers: 

Lost; A black female cocker spaniel. Will not answer to “Daisy,” 
although that is her name. 
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Unless each theory explicitly indicates phenomena and nei he ie a 
it is applicable, we may be in no better position than would о — 
ing for Daisy. I fear that in a number of instances we have mistaken vagu 

i y pee be interested in the relation between a method for analysis. 2 
data and a theory. In some senses a method of analysis might be ongi 
as a general theory. A proper description of any such method inc ck 
statement of the necessary conditions of situations in which the wo s 
might be employed. In a sense, however, we may contrast a re i i oe 
theory. This is in the specificity of definitions, If a construct pod а 
respond to different kinds of observations їп different situations, anam de 7 
exists. A method that is applicable in several situations may be thoug 
of as a theory in each situation. Each of these 
the other in that it includes the further definition 
to the situation. Thus, 


theories is separate from 
s necessary to fit the method 
a method may be considered as a family of ia 
For example, multiple-factor analysis might be considered as a family О 
theories. Any time it is applied to a particular content area, it may be thought 
of as a theory. A method of analysis may be thought of as an abstraction 
from a number of situations with particular contents. In order to produce 
any particular theory from the family of theories represented by a method, 
it is necessary to specify particular contents. 

We may be interested in classifying methods of analysis as more or less 
general. This classification may be taken to correspond to the number of 
different situations in which the method is applicable, A more general method 
would be applicable in a larger number of different. situations. However, 
there may be a trend that the more general the method is, in this sense, 
the more extensive the definitions need be in order to make the method 
explicitly applicable to any one particular situation 
methods may be further from theories. 


Ifa theory is to specify the observations 
deals, how then can a the 


. Thus, the more general 


and phenomena with which it 
ory be general? We may wish to indicate the ех- 
tensiveness of a theory’s applicability by the contrasting terms: individual 
versus general. In case a theory concerns а large number of observables in 
a large variety of situations, the theory could be termed a relatively general 
theory. In case a theory concerns only a limited situation, the theory could 
be termed individual, or Specific, to this situation. Note the particular usage 
of the word specific аз Synonymous to individual in this context. Both @ 


applies. 

Although one might hope t 
be amalgamated into a more 
the individual theories may 


hat a collection of individual theories e 
general theory, there is some possibility that 
be too divergent to provide a basis for this 
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operation. Many individual theories arise from applications of psychometric 
methods to problems of applied psychology. The particular situations dealt 
with are defined by the needs of particular institutions. While this activity 
is a highly important aspect of psychometrics, its productivity of general 
theory is problematic. There is considerable danger that the practicing 
psychometrician may find himself in the kind of position illustrated by the 
following advertisement, which has also appeared in a weekly Princeton paper: 


Aspiring Artist: Will decorate children’s rooms. Funny animals a 
specialty, but will comply with any unreasonable request. 


It seems to me we should be on guard against at least the extremes of such 
situations. 

There seem to be problems, then, in our attempts at developing highly 
general theories. We may, in fact, produce instead of general theories families 
of individual theories which we might call psychometric methods. Or we 
may be so vague that the theory is unacceptable. Likewise, there are dangers 
in working with individual, or specific, theories. A recommended strategy 
is to attempt theories between these two extremes. The smaller, but not 
individual, theories may net us much in knowledge gained. These theories 
will be more easily established and experimented on by individuals. And 
finally we might hope that several such theories would be compatible for 
amalgamation with more general theories. 
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In an earlier paper, a method of analysis, due to Neyman and now 
known generally as variance component analysis, was used to examine F-test 
bias for experimental designs in education of the randomized block type. 
The same method is now applied to study F-test bias for designs of the Latin 
square type. The results, in general, disprove the view that, for a valid appli- 
cation of Latin square techniques, it is necessary that all interactions are zero. 


In an earlier paper (5), a study was made of F-test bias for experimental 
designs in educational research of the randomized block type. In this paper, 
a similar study is made of those designs of which the simple Latin square 
is the prototype. The B-ratio technique, due to Neyman and described in 
the earlier paper, is again employed. 

As McNemar (8) points out, the usual textbook statement of the theory 
underlying the use of the Latin square implies zero interaction between 
the main effects. McNemar claims that where the assumption of zero inter- 
action is not met, investigators will obtain too many “significant F's’; he 
then goes on to conclude that, since significant interactions are so common in 
psychological research, the Latin square is seldom appropriate and that 
“it is defensible only in those rare cases where one has sound a priori reasons 
for believing that the interactions are zero." 5 

In the discussion which follows it will be shown that McNemar is by 
no means altogether correct in his point of view. It would appear that he has 
failed to realize that in the field of education and psychology, the application 
of the Latin square design has progressed beyond the usual simple textbook 
formulation and that some of the later applications show the need for 
modification of his rather sweeping generalization. In particular, it will be 
shown that the Latin square can be applied in several cases where the inter- 
actions are not zero; also, that in those cases where bias is present, it may well 
be negative and not positive as McNemar would maintain—a result which 
can only increase and not diminish the significance of any F-test. As defined in 
the earlier paper (5), an F-test is said to be positively or negatively biased, 
if, when the null hypothesis being tested is correct, it gives rise to a larger 
or smaller proportion, respectively, of significant F-ratios than is warranted 
by the F-distribution. 
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It will be helpful to the treatment of our problem if we distinguish 
between the two main types of interaction that can occur in psychology: 


Type A—where each individual (or whatever the unit may 
be—class, grade, etc.) receives only one of the several treat- 
ments applied and is represented by only one measurement in 
the data to be analyzed. In this case, interaction is between 
main effects, such as treatments, schools, etc. 

Type B—where repeated measurements are made on the 
same individual or group. In this case conditions may differ 
or different treatments may intervene; earlier measurements 
(treatments, etc.) may affect, ie., interact with, those that 
follow. 


The position is still further complicated in that Type B interaction 
may be accompanied by Type A. Also, besides pure interaction effects, the 
interaction component of any analysis of variance may contain other terms— 
described as group errors in the previous paper (5). 

1, Applications of the Latin Square Involving Type A Interaction Only 
1.1. An experiment comparing several methods of teaching some school topic 


For simplicity let us take the case of a 3 X 3 Square, say: 


Streams 
и v w 
f A B @ 
Schools 0 В C A 
h C A B 


i.e., in each school there are three experimental groups which are subjected 
to the three methods A, B, and C and which can be classified according to 
some other factor (e.g., streams). (For the benefit of some readers it is to 
be explained that in many English schools the children in each grade are 
assigned to classes according to level of ability. The process is known as 
streaming. A three-stream school is one in Which there are three classes in 
each grade representing three levels of ability.) It will be assumed that the 
numbers in each group are equal—to avoid bias as discussed in the earlier 
paper (5). 

Then we may consider tw: 


о hypotheses: either (i "T . thesis 
—the methods have the same ег (î) a particular hypo 


mean effect when an average i сеп for each 
А ч ge is taken for e 

method over the three schools and the three streams; or (i) a general hy- 
pothesis—the methods have the same mean effect when an average is taken 


over the total population of schools (of which the three given schools are & 
random sample) and the three streams. 
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The first hypothesis is of little interest to the practical investigator. 
Furthermore, when real interaction is present between methods, schools 
and streams, the hypothesis cannot validly be tested by the Latin square 
design and analysis. There are obviously insufficient data: a factorial design 
is required. But it does not follow, as MeNemar maintains, that the Latin 
square analysis would be positively biased. It might be positively biased in 
certain cases, but in others, it would be negatively biased, e.g., when real 
interaction exists only between two of the three classifications (the Latin 
square design then becomes effectively factorial). 

Now let us consider the bias involved in the Latin square analysis when 
used to test the general hypothesis. It is now necessary to think of the three 
schools as a random sample from the total population of schools (which, as 
usual, will be taken as being infinite in number). Also the three schools 
should be assigned at random to the rows of the Latin square. 

Then the mean scores for the nine experimental groups might be repre- 
sented as follows: 


TAQ AT at Mate t+. +T 54-2. eté 74 -Qu T c sc в 


T +Q HT at metes 7,4-Q.3- T cd- 0.c- 65 TAQw tT at 1«4 és (1) 


Pat О, HT eH mete TAQ AT at mates | т„-Е0О„-ЕТв-Ет»в-++е 


where 


(i) The general mean over the three methods, the three streams, and 
the total population of schools has been taken as zero. 

(ii) The z-values are the main effects of schools, the z-value for each 
school being the mean for the school taken over the nine method-stream 
combinations. (It will be assumed that the total population of «-values has 
zero mean and variance c; .) 

(iii) The Q's are the main effects of streams over the three methods 
and the total population of schools, and >, Q = 0, where У), F is defined 
to be the sum of all the terms or expressions of type Ё, and п denotes the 
total number of such terms. 

(iv) The T's are the main effects of the methods over the three streams 
and the total population of schools, and XT 

(vr) The nine ’s represent the real interaction effects of the three methods 
and the three streams (over the total population of schools) and are such that 


> Nuk = 0 = 25 Jh = pn Nwk (k = A,B, C), (2) 
У ma =0= > тв = У те (l = u, v, ш). 
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(vt) The ез include real interaction between schools, on the one hand, 
and method-stream combinations, on the other, and group and sampling 
error. Each e could in fact be expressed as 


ecc, 


where £ is the real interaction term for the school and method-stream com- 
bination to which the given є belongs, and £ is a purely random term made 
up of group error and sampling error. 

The infinite population of ёз (of ¢’s and £'s) corresponding to any one 
cell of the Latin square will have zero mean. We shall assume that the nine 
populations of e-values (one for each cell) have the same variance c; (the 
usual assumption of homogeneity of variance). 

The e's of the cells in any one row of the square will be correlated since 
their ¢-components are correlated. A few words of explanation might be 
helpful. For each school there are nine (-values (one for each method-stream 
combination) and their sum is zero; this follows from definition (i) above. 

* Also, for each method-stream combination there will be an infinite population 
of (-values (one ¢ for each school). Furthermore, these nine populations of 
§-values will be correlated because the sum of the nine ¢-values for each 
school is zero. Since schools are assigned at random to the rows of the Latin 
Square, it is not very difficult to see that, while the {-уашез which appear in 
the three cells of any one row will be correlated with one another, they will 
not be correlated with the (-values in the cells of the other two rows. 

The correlations (nine in all) between the «values may not all be the 
same. This will happen when heterogeneity of correlation exists between 


the t-interaction effects, [A simple example of this type of heterogeneity is 
discussed more fully in the first paper (5).] 


Let the correlations in the first row be d 


* 2 1 
25 (ri + Qe НТ, аа) -glXMG EQ. T, + mated}? (3) 


By equations above, 


BAT mete) aa Pee Se a) 


3 
It follows that the E. V. of total sum of Squares is | 


б РОО pus овари л 


9%: (5) 
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The sum of squares between methods is 
> (M, My, (6) 
3 
where 


Ma = Mg = (T4 =< Ts) + sl(a ct 6s + єз) E (ez + & + є)], (7) 
апа 


Е. У, (Ma — Ma) = (ТА x To T 160: = 2o (pi + Pss T Pso) |. (8) 
Therefore, the E. V. of sum of squares between methods is 

QR а 
Уй. (T, = Тв) + 20. — A 

? (9) 
3 2. 2h g ; 
=3 LT + 20: — c (since >> T = 0). 

3 3 
Similarly, the sum of squares between streams has Е. У. 
2R 


3 50° + 20: — y i. (10) 
3 
The sum of squares between schools is 
2, M, — My, (11) 
3 


where 
M, — M, = (x, — т) + Mina + тв + ec) — (тв + т.с + 1ьа)] (12) 


ЕВ &l(e Tec єз) X (es sep esp 0)], 
and 


Е V. (M, 3 Mj) = 2c; Eja 4 [Qa + тв + 7с) E (nus + me = nea]? (13) 


2 
GF 2 с: + = (pia + роз + раз + Pas + Pse + ра). 


It follows, on reduction, that the E. V. of the sum of squares between schools is 
Ge +} E (ma + л» eot +2949. 
3 


The Е. У. of the residual sum of squares can now be found by subtraction, 
but it can also be found directly after the manner of the other variance 
components. (The second procedure provides a check on the algebra in that 
the Е. V. for total should equal the sum of the other E. V.’s). By either method 
the E. V. for residual is 


i 23 (mua Fc + Nes) + 20% — X oi. (15) 
3 
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It is to be noted that 


Р" 2 
У "y E & > (та + tee + tee) + 3 >, (ма + тс + Men) - 


This is easily deduced from (2). | 

We are now in a position to apply the B-ratio technique to examine the 
bias involved in the F-test methods v. residual. The B-ratio for this test is 
obtained by (i) applying the null hypothesis that the main effects for methods 
are equal; i.e., T, = T, = Te (= 0 since 25 T = 0) and (ii) taking the 
ratio of the E. V. of the methods variance to the E. V. of the residual variance. 
Tt will be seen that it has the value 


(1 - УАК - gj +i È (ma + me + v] (16) 


which will normally be less than u 
methods and streams. For the 3 
random arrangements: 


nity if there is any real interaction between 
X 3 Latin square, there are essentially two 


A B @ and A C B 
Be A BA @ 
C AB СВ А. 


For the second arrangement, the B-ratio is of the form 


(1 36 AIC - ges ti Э (tua + ten + rd | 


paper, this heterogeneity produces positive bias. Whether the combined 
effect of the two types of bias is positive or negative will depend on various 
factors. But the point to be , for this application of the 
Latin square, the bias, if not negative, is at least less than that for the fac- 
torial type of experiment. Ш 


ty : the bias is unimportant in the one case, it must 
a fortiori be unimportant in the other. 


Unequal numbers of cases in the cells of the Latin square (whether 
proportionate or disproportionate With regard to the three classifications, 
schools, etc.) will also introduce b; 


t jas into the F-test. This type of bias is 
discussed in the earlier paper for th 


e case of the replicated (factorial) type 
of experiment. 
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1.2. Alternative designs 


A single Latin square is not a practical design for a methods experiment. 
It provides too few degrees of freedom for the estimation of the residual or 
error variance. To obtain increased precision, two courses are available: 
either (1.2a) the replication of the same type of square; or (1.2b) the use of 
two or more different types of square with or without replication. In both 
cases, each square and each replication of it requires a separate sample of 
schools. 

There would appear to be no special merit in preferring course (1.2а) 
to (1.2b) as some investigators have done—under the belief, presumably, 
that the use of as many different squares as possible is a necessary part of 
the randomization process. It is quite sufficient for randomization that the 
schools are allocated at random to the rows of a square {whether the same 
square throughout as in (1.2а) or to different squares as in (1.25)]. 


1.2a. The replication of the same square 


For simplicity, we will again take the 3 X 3 square discussed in section 
1.1. Let there be n replications, involving, therefore, 3n schools in all. 'Then 
by an analysis very similar to that of section 1.1, it can be shown that the 
expected values for the different components of the variance analysis are as 
in Table 1 (same notation as before). 

Two observations can be made: (7) By testing methods against residual 
within schools instead of square residual, a much more precise test is obtained. 
Furthermore, this F-test cannot be biased (negatively) as a result of any 
real interaction between methods and streams. It will, however, like the other 
test, be subject to any positive bias arising from heterogeneity of correlation 
between the cells in any one row. (7) A test of interaction between methods 
and streams is provided by the F-tests: square residual v. residual within 
schools and rows v. residual between schools. 

If interaction were shown to be present, and it were desirable that the 
methods should be compared for the three streams separately (instead of an 
average being taken as for the null hypothesis tested above), this could 
easily be achieved by analyzing the results for each stream separately. 
The precision of the tests involved would, however, be poor since school 
differences would be contained in the error variances. 


1.2b. Use of two or more types of square with or without replication 


Ап analysis similar in type to (1.2a) can again be carried out. The precise 
form the analysis takes will, of course, vary with the types of squares selected 
and the numbers of replications. To save space no such analysis is reproduced 
here. It might, however, be pointed out that, where 3 X 3 squares are in- 
volved, there are only two possible types of square and with the same number 
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of replications of each, the analysis bears а certain similarity to that dis- 
cussed on p. 283 et seq. (See also Table 2.) The conclusions to be drawn with 
regard to bias are the same as for case (1.2а). 


1.3. More complicated applications 


Consider the designs given in Figure 1. 


Boys 
only 


Schools Schools 


Girls 
only 


“Design 12 Design 1b 
FIGURE 1 


Design la, or something very similar, was used by Burt and Lewis (2). 
Once again it can be shown that the B-ratio for design 1a is less than unity, 
although as in all the other cases discussed, positive bias can result from 
heterogeneity of correlation between the four cells in each of the eight rows. 

But the same cannot be said for design 1b. In this case it can be shown 
that with real interaction present, the B-ratio is likely to be greater than unity: 
а more serious degree of positive bias may, therefore, be present in the F-test. 


2. Applications of the Latin Square Involving Type B Interaction 


Most of the Latin square studies reported in journals have been of the 
type where the subjects of the experiments have been subjected to a suc- 
cession of treatments and tested for each treatment: they could therefore 
involve Type B interaction [cf. Thomson (11), Sutherland (10), Grant (6), 
Edwards (4), and Archer (1)]. 

The Latin square design requires the order of succession of treatments 
to vary for the individuals and thus makes the problem of interaction more 
complicated than that which deals with Type A interaction only. 
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2.1. The case of a single Latin square 


For example, consider three individuals subjected successively to three 
treatments A, B, and C in the following orders: 


1 A B C 
Individuals 2 B e A 
3 G A B 


With interaction present (whether type A or B), as stated earlier, the square 
does not provide sufficient data to test any worth-while hypothesis for the 


our three cases are a sample? Once again the data are insufficient if type B 
interaction is present. The square involves only three of the possible six 
orders for the treatments, and all six orders would have to be considered in 
order to obtain a worth-while generalized result. We will examine the latter 
possibility presently. The only conclusion to be drawn is that a single Latin 
square is of little use when type B interaction is present or suspected. This, 
of course, is in accord with МеХетаг point of view, 


2.2. Replication of the same square 


In this case each of the treatment sequences is 


individual but to a group of individuals. The applications of both Thomson 


(11) and Sutherland (10) fall into this category. (Sutherland’s square is a 
Greco-Latin square but the same principles apply.) The method was also 
used by Corrigan and Brogden (3), whose application was discussed by 
Grant (6). Edwards (4) also illustr 


applied not just to one 


at already discussed 
on p. 279. Schools is replaced by indivi 5 and rows now represents groups 
nt sequences, Streams is replaced 
re some important differences which, 

to save Space, we will only indicate: 


(i) Besides Possible type 4 interaction terms, there may be also type B 


he first four variance components (see 

J Gi) re earlier anal i.e., errors, other than random 
e Pe pecu lar to а school 1 d in of and, therefore, affected 
a Variance components, But in the ase, it will be seen that group 
errors will vary from cel] to cell of th quare but will be the same for 


| 


NEIL GOURLAY 283 


What tests may be applied and what is the position with regard to bias? 
(i) An important test is that of square residual (or uniqueness) against residual 
within individuals. If significant, this may indicate either type A or B inter- 
action, or group errors, or some combination of the three; the analysis cannot 
differentiate. With a significant result, there is little point in proceeding 
further. The test treatments v. residual within individuals would have such a 
limited interpretation that it would be virtually valueless; as a test of a general 
hypothesis about treatments, the test treatments v. square residual would be 
biased to an unknown extent. 

(ii) If non-significance is obtained for the test of square uniqueness, а 
further test of zero interaction (and group error) is provided by taking 
rows (growps or sequences) against residual between individuals, provided the 
groups were random in the first place. 

(iii) With both these tests non-significant, the other main effects might 
be tested against residual within individuals. But the reader must be warned 
against following such a test sequence blindly. It is to be remembered that a 
statistical test cannot prove the null hypothesis on which it is based; although 
the two preliminary tests are non-significant, it may still be the case that 
interaction (and/or group error) is present. A priori knowledge as to the 
likelihood of interaction and group error is obviously important. Where 
past experience would suggest that no interaction or group error is likely to 
be present, and the two preliminary tests confirm this, the tests of main 
effects against residual within individuals can be made with some safety. 
But where interaction or group error is known to be likely, little reliance 
can be placed on the tests of main effects, even though the preliminary tests 
give a non-significant result. In other words, the given experimental design is 
almost useless for dealing with this situation. 

Corrigan and Brogden’s data (3) show non-significance for both pre- 
liminary tests. Sutherland's data (10) show significance for the second test 
(the groups were not random) and would appear to show significance also for 
the first test; this would, of course, invalidate his other tests. 

Edwards (4, p. 325) seems to regard square residual and residual within 
individuals as estimates of the same variance but, as we have seen, this can 
only be the case when interaction and group errors are zero. 


2.3. Analysis involving complete sets of squares 


When type B interaction is present or suspected, it is obvious that all 
possible treatment sequences must be considered if a generalized result is 
to be obtained. Grant (6) discusses this case. 

Once again, for simplicity, we will consider the case of the 3 X 3 square. 
We will assume that individuals are assigned at random to the rows. There 
are then effectively only two Latin squares involved, corresponding to the 
Six possible orders of treatment ABC, BCA, CAB and ACB, BAC, CBA. 
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For convenience we will give the eighteen cells of the design the numbers 
1 through 18, in the order just stated, for the six sequences. T. 

We will consider the case of n replications of the design, i.e., n indivi ua 2 
will be assigned to each row (or sequence). Then each of the n бышы Е. pia 
one of the eighteen cells may be represented as the sum of six terms o 
form 


r+ Q4+Ttnttte, (17) 
where А . . 

(0) The general mean over the eighteen cells and the total population o 
individuals (assumed infinite) has been taken as zero. 

(її) The z-values (6n in all) are the main effects of individuals averaged 
over the six sequences. (It will be assumed that the total population of 
m-values has zero mean and variance ci.) 

(wi) The Q's (0,, 0,, Q..) are the main effects of columns, averaged 
over the six sequences and the total population of individuals, and >, Q = 0. 

(ir) The T's (T, , T, , 
over the six sequences and the total population of individuals, and >, T = 0. 

(v) The »'s (18 in all) r 
action for the 18 cells and are such that the six sums of six 7-te 


Zero. 
18 in all) represent possible group or cell errors. (It will be 


that the total population of £-values has 
zero mean and variance оў .) 


(vii) The ез (18n in all 
other effects have been tak 
of е5 for each cell has zer 
any one row will be correlated. Let р», раз раї 
the first row and во on. Also let R = 


) represent the residuals within cells after all the 
en out. It will be assumed that the total population 
. Then the ев of the cells in 
epresent the correlations for 
18 p. 

different sums of Squares in the vari- 
t the results are given in Table 2. 
from this analysis? 


€ was other evi 
) to show 


(iz) It is always possible to test treatments (or columns) against residual 
between cells. The B-ratio for this test i 


S never greater than unity. In this 
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respect the present analysis differs from that for the replication of the sem 
square (see previous section), where type B interaction may affect both 
treatments (or columns) and residual between cells to give a B-ratio (and 
therefore a bias) of unknown size. 

Before concluding this section it might be of interest to mention that 
type B interaction is similar to the carry-over effect; studied by statisticians 
in animal science [cf. Patterson (9) and Lucas (7)]. Further, the experimental 
model which they consider is very much the same as that treated in this 
section. Their analysis of variance, however, follows quite a different pattern 
and permits the testing of a wider range of hypotheses. One of their findings 
is that there is no bias involved in the testing of unadjusted direct effects 
against the error variance. This agrees with conclusion (7) above. (It must 
be noted that group error does not occur in the animal science experiment.) 


2.4. More complex designs. 


No attempt will be made to consider F-test bias for analyses of more 
complex designs. It should now be apparent that where tests rest on the 
assumption of zero interaction and group error, the design should provide a 
test of this assumption. Also, in cases where such a test proves significant 
(or where the presence of interaction or group error is known to be likely 


even though unrevealed by any test), the design should furnish tests of 
main effects, which, although less 


have been used, possess B 

Archer (1) show. 
he offers in his pap 
overcome by ensuring 


ations of the designs 
ty could be partially 
h his methods provide 
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CHARACTERISTICS OF TWO MEASURES OF PROFILE 
SIMILARITY 


CHESTER W. HARRIS 
UNIVERSITY OF WISCONSIN 


Analogs of Pearson’s coefficient of racial likeness and of Mahalanobis’ 
distance measure have been proposed as descriptive statistics for comparing 
two individuals. This paper shows that two different definitions of “ипсот- 
related” variables—one associated with an inverse transformation and 
the other with a principal-axis transformation—give rise to these two descrip- 
tive statistics. The effects of putting the data into certain forms, such as 
equalizing the variances of the variables or equalizing the means of the 
persons, prior to using either of the two transformations, are discussed. 


The interest in measures for assessing similarity (or dissimilarity) of 
profiles is reflected in such recent summaries and discussions as those of 
Osgood and Suci (5), Gaier and Lee (3), Webster (8), Cronbach and Gleser 
(2), and Thorndike (7). Several of these papers consider the problem of 
similarity of profile for two individuals, as contrasted with two groups. 
The latter problem may be formulated as one of discrimination between 
groups, and, as Cronbach and Gleser point out, two well-known approaches 
to its solution have been tried. One is the Pearson coefficient of racial 
likeness and the other the Mahalanobis distance measure, which is known to 
be related to Fisher’s discriminant function. The analogs of these two measures 
for the problem of comparing two individuals have been suggested in the 
discussions mentioned above. For one, Osgood and Suci and, independently, 
Cronbach and Gleser have suggested a measure that is analogous to the 
Pearson CRL; also, Cronbach and Gleser suggest a Mahalanobis-type 
measure and compare and contrast it with the former. 

The purpose of this paper is to examine these two proposed measures 
of profile similarity as descriptive statistics. In order to do this, the concept 
of Euclidean distance will be reviewed, a matric notation developed to 


describe a distance measure, and then the distinction between these two 


measures considered as a distinction between two definitions of uncorrelated 
tain forms of the data upon 


variables. Finally, the effects of adopting cer 
these measures will be outlined. The treatment here follows from well- 


known principles of matrix algebra, and consequently does not offer any 
strictly new propositions. However, it does clarify certain characteristics 
of these two proposed measures and in so doing may assist in describing them 


as descriptive statistics. 
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Euclidean Distance 


The Euclidean distance between two points in space is a well-defined 
concept that has been generalized to a space of any size. Providing that 
the space, of size k, say, has been defined by a rectangular Cartesian system 
of reference axes, then the square of the distance between any two points 
in this space is given by the sum of the squares of the differences between 
paired coordinates of the two points. A rectangular Cartesian system consists 
of k mutually perpendicular (orthogonal) axes; the pairing of coordinates 1s 
done, of course, with respect to these k reference axes. For example, suppose 
that four persons are located in a space of size two by the following coordi- 
nates with respect to a rectangular Cartesian system: 


Person a Person b Person с Person d 
Axis 1 4 0 1 3 
Axis 2 9 5 8 6 


The square of the distance between persons a and b, say, is given by: 


(4 — 0)’ + (9 — 5)? = 32. Since Squares are being summed, the result is 
obviously the same if we compute, instead, (0 — 4)? + (5 — 9)? = 32. 
Designate this matrix аз Х. A method of securing these Euclidean 


distances is to operate on the matrix X'X, where X’ is the conventional 
transpose of X. For these data, X’X is 


a b с d 
a 97 45 76 66 
b 45 25 40 30 
c 76 40 65 51 
d 66 30 51 45 


The diagonal elements of XX are simply the sums of the Squares of the coor- 
dinates for a given person. The off- 


1 diagonal elements are the sums of the 
paired products of the coordinates for the two persons designated by the row 
and column headings. Thus, for person a the diagonal element is 
(4° + (9)? = 97. The element 45, occurring in row b and column a, and in 
row a and column b as well, is given by (4)(0) + (9)(5) = 45 The square 
of the distance between persons a and b is then given by 97 + 25 = 2(45) = 32, 
as before. This in effect merely uses the principle of rewriting a square of à 


difference between two terms as the sum of the squares of the terms minus 
twice their cross-product, 


The diagonal elements of the matrix X’X give the squares of the lengths 
of each person vector in the k-space, and the off-diagonal elements give the 
scalar products of each pair of person vectors in this караба. A measure of 
Euclidean distance between persons is thus given by the indicated operation 
on the matrix X'X, when the matrix X describes the several persons with 
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respect to a rectangular Cartesian system. This operation may be formulated 
in matric terms. For any pair of persons, i and j, this operation consists of 
pre-multiplication by a row vector, Е, of this form: 

Persons 
Gu фей + FFD e шец P QOEM oa 
[0 O = 0 +1 0 e 0 —1 0 e 0] 


followed by post-multiplication by the transpose of this vector. For example, 
the square of the distance between persons а and b is given by 


97 45 76 66 T 


На -1 0 9] 45 25 40 30 || —1 
76 40 65 51 0 
66 30 51 45 0 


which is equal to (97 — 45) — (45 — 25) — 32, as before. Thus, the square of 
the distance between any pair of persons is given by a product of matrices 
that may be written 

ЕЖЕ = ДУ. 


This D? may be interpreted in more than one way, depending upon how one 
defines uncorrelated variables. This problem must now be considered. 


Uncorrelated Variables 


In order to show the nature of this problem, let us define, loosely, an 
uncorrelated form as 


TZZ'T' = a diagonal matrix, 


where Z is a given matrix of data and Т is a transformation. To avoid dis- 
cussing at this point certain problems of the form of the data, let us specify 
that Z consists of deviation scores that have been systematically reduced 
so that the variance of each row of Z equals unity. In other words, the data 
are taken in a form such that ZZ' is the conventional correlation matrix 
with units in the diagonals. Later, questions concerning the form of the 
data will be raised and Z will be shown to be a product of matrices, one of 
which is the matrix of raw scores. Now the definition of an uncorrelated 
form given above does not specify the non-zero elements in the diagonal 
matrix; in other words, it does not specify the weighting to be given each of 
the uncorrelated variables. Two systems of weighting appear to have special 
merit; one is given by 
TZZ'T'—I 
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and is loosely related to the Mahalanobis distance measure for groups. The 
other is given by 

TZAT" EID Ж 
where Dj designates the matrix of non-zero latent (or characteristic) roots of 
the matrix ZZ’; this definition is associated with the Cronbach-Gleser D* and, 
as they point out, with the Pearson CRL. These two weighting systems give 
different results; one weights the uncorrelated variables, i.e., the factor 


Scores, equally; the other weights the factor scores in proportion to the 
size of the square roots of the latent roots. 


The Inverse Transformation 
First consider the transformation 
related variables. It always is possible to 
axis factors and factor Scores; thus 


that yields equally weighted uncor- 
resolve Z into a product of principal- 


Z = GD,P', 
where G is a set of ortho 
(in standard form) corresponding to the non 


1. , where r is the rank of Z, and Р/Р = I, . GG’ is а 


of ZZ’. For a Summary, see Harris (4), 
give a solution for T. Set 


X = Dy'G'z = pr, 
Then XX’ = 7 This principle of 
the principal-axis factor 


efined above, these factor 
ans of zero. For the purpose of determini i 


EX'XE' = ЕРР'Е!, 
‚ the X is simply the form PP’, 
formation is to give distance measures that 
weighted principal-axis factor scores, 

It is conventional to ask concerni 


that is, the form Y" The effect, then, of this trans- 
are functions of the equally- 


MAI . - Consider РР’. If 7'7 is non-singular, 
as it might be, for example, if № › the number of persons, is less than К, the 


$$ $$ ———— — 
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number of variables, then PP’ necessarily is simply the identity matrix, I. 
This means then that, using this transformation principle, studying relatively 
few persons with respect to relatively more linearly independent variables 
always yields the same numerical value for the distance between every pair of 
persons, regardless of what set of variables was used. If N is greater than k, 
then Z'Z necessarily is singular and the matrix PP’ is a singular idempotent 
matrix that is a multiplication unit for a group (in the algebraic sense) of 
singular matrices. This also means that there are many sets of data that will 
yield the same matrix, PP’, when the inverse transformation is made and the 
resulting X’X calculated. In other words, under these conditions the distances 
between pairs of persons are not unique to a given set of data. For example, 
if we pre-multiply any given set of data, Z, by a non-singular matrix we 
leave invariant the matrix PP’, but not, of course, P itself. Since distance 
measures computed from data that have been transformed by this inverse 
transformation are functions of PP’, this lack of uniqueness to the given data 
should be recognized. It also should be recognized that these comments assume 
that the inverse transformation is developed from the data in hand rather than 
from data for a different group, such as a normative group. If the latter is 
done, these statements do not hold. 

A direct, but quite arduous, calculation procedure would be to factor 
either ZZ’ or Z’Z in order to determine P, the matrix of factor scores. Another 
calculation method results from the identity 


РР! = 010000) Z, 
provided, of course, that ZZ’ is non-singluar. Still another calculation 
procedure is to utilize the principle of Rao’s transformation (6) to develop 
a triangular matrix, C, such that 
CZ = X. 
Then 
XIX = Z'C'CZ, 
where C’C is the inverse of ZZ’, provided it exists. It is interesting to observe 
that this latter method works even though ZZ’ is singular. Adopting a new 
notation, 
СС = (ZZ')* = GDC; 
and, 
2'(27'у Z = PD,G'GD,’GD,P’ = PP’, 
as before. This analysis uses the principle that if ZZ’ is singular, then there 
exists a matrix (ZZ/) ', which also is singular, such that 
ZZ'(ZZ'y.' -—(Z2^):'ZZ' = GG", 


where GQ’ is the symmetric idempotent matrix that is a unit for multiplication 
within the group. The factored form of (ZZ)! is then seen to be GD;^G". 
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Principal-Axis Transformation 


The inverse transformation discussed above gives as the uncorrelated 
form of the variables a diagonal matrix whose non-zero entries each equal 
unity. In other words, the inverse transformation gives uncorrelated variables 
of equal (unit) variance. As noted above, a different transformation may be 
defined by requiring that the transformation matrix, Т, be such that 

TZZ'T' = Dy, 
where, as before, Di is the matrix of non-zero latent roots of ZZ’. This is 
the familiar canonical form of a symmetric matrix; as such, it is a well- 
known definition of an uncorrelated form. It differs from the inverse trans- 
formation in that the transformed variables are now weighted unequally, 
rather than equally, these unequal weights being given by the square roots 
of the roots of the characteristic equation of the symme 


If this is chosen as the uncorrelated form, then the transfor 
plished by setting 


tric matrix ZZ’. 
mation is accom- 


X = G'Z = DP". 


It then follows that XX’ = D? › as required. In order to determine distances 
between pairs of persons, calculate 


ЕХ'ХЕ' = EPD?P'E' = EZ'ZE', 


since GG’ is a unit for multiplication, as described above, regardless of the 


hoosing the canonical form of ZZ' as the uncor- 
ез gives distances between pairs of persons as & 


Z'Z. The calculation procedure obviously requires 


no comment. For distance measures this solution is unique to the given set 
of data; this is related 


choices that are meaningfi statistical concepts are the 
identity matrix, Т, associ гі 
diagonal Dj , associated 


are weighted equally, whereas 
are weighted unequally. 


The Form of the Data 
| Consider now a matrix of data, Y, that Consists of the observed measures, 
ie., the raw scores. In order to transform these data into the form of Z, 


t2 
© 
or 
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first write 
YL = [y], 
with [y] the matrix of deviation scores. For any Y, which is of order k by N, 


the matrix L which accomplishes the transformation of raw to deviation 
scores is 


N-1 -1 -1.. 21 
N N N N 
-1 N-1-1  =1 
N N N N 
-1 -1 -1., N-1 
N N N N 


The matrix L is square, symmetric, of order N, and of rank (N — 1). Direct 
multiplication verifies that L = 12, i.e., that L is idempotent. The matrix 
L is a quadratic form with roots of unity and is an example of the type of 
matrix referred to in Cochran's theorem (1). Further, L is a multiplication 
unit for any vector consisting of N terms that sum to zero; the row vector E 
employed earlier is such a vector. Note that what is being done here is to take 
an operation that is ordinarily considered to be an additive one and to write 
it as a multiplicative operation; this becomes a useful tool in the analysis 
of certain relationships among matrices. It is now possible to write 


SYL = Z, 


that is, to pre-multiply the deviation scores by the appropriate diagonal 
matrix to secure the form Z. The diagonal matrix is, of course, one in which 
each of the non-zero elements is given by the reciprocal of the product of 
the square root of N and the standard deviation of the variables. 

Using this definition of Z and noting that S is a non-singular matrix, 
that L is idempotent, and that L is a multiplication unit for E, the distance 
between any pair of persons under the inverse transformation becomes 


EZ'(ZZ) ZE'- EY(YL yy ^Y. 


(A method of calculation when ZZ” is singular was suggested above.) Reduced 
to these terms, then, this distance measure is a function of the raw scores 
and of the inverse of the matrix of variances and covariances. An analogous 
reduction of the Cronbach-Gleser measure gives 


EZ'ZE' = EY'S'YE', 
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showing that it is a function of the raw scores and the reciprocals of the vari- 
ances of the variables, since the scalar 1/N affects all pairs in the same way. 
Clearly, the two measures are identical if, and only if, УГУ’ is a diagonal 
matrix of variances. It also is evident that 


EY'S'YE!' < EY'YE', 
that is, that any change in scale fi 
bach-Gleser measure. This is no 
inverse transformation. 


Other modifications of the 
techniques. One such modificati 


or one or more variables affects the Cron- 
t true for the measure derived from the 


ESsY'MYS,E' = EY'MYE', 

Which merel 
of measure, 
Finally, MYL is a d 
rows and by columns. $; 
makes no difference Ww 
first forms YL, and th 
either case, Now, sin 


у states that any change in scale for the columns affects this type 


, it sums to zero both by 
ear associative algebra, it 
d then centers by rows, or 
ulting MYL is the same in 
r any E, it сап be seen that 


EY'MYE' = BL Y'MYLE', 


This identity emphasizes the point that if the data are centered by columns 
then double-centering does not alter this particular distance measure. 
Table 1 gives the algebra of distance m 


iven i easures betw irs of students 
that are given in Table 3. ween pairs of s 
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TABLE 1 


Distance Measures for Various Forms of the Data 


Form of Data Inverse Transformation 


ey’ (ҮҮ y ie? 


z^ (yLy’ a e 


YL 
SL ayy y e’ 

we Ex "u(uyy ^u)" hire 
MIS ЕЗҮ u (музы) uns e 


EY ^u (uvLy ^u)- ВЕ” 


TABLE 2 


Illustrative Distance Measures between Pairs of Students 
ЕЕЕ 


EY'YE* Ey’ (vr*)-1ve* 


TABLE 2 


Illustrative Scores on Three Tests for Five Students 


Student 


1. Spelling 


2. Usage 
ЕА 


Vocabulary | 18 


11.82 е 1.23 1.84 
8.08 11.61 а |1230 1.80 2.00 
1.99 1.92 1.89 


5.52 11.25 8.33 
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THE ESTIMATION OF THE DISCRIMINAL DISPERSION 
IN THE METHOD OF SUCCESSIVE INTERVALS* 


Raymonp H. Burros 
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, Anew algebraic formula is derived for estimation of the discriminal 
dispersion in the method of successive interv: The legitimate use of the 
formula requires that as many normal deviates as possible be present in the 
matrix. For this reason, it is recommended that deviates corresponding to the 
interval (0.01, 0.99) of the cumulative proportions be used, instead of those 
corresponding to (0.05, 0.95), the interval used by Edwards and Thurstone. 
Computations on data published by Edw ‘ds and Thurstone showed that 
when adjustment was made for variability in dispersions calculated by the 
formula of this paper, а reduction of fifty per cent in mean absolute discrep- 
ancy was produced. Since the formula is - to use and avoids the disadvan- , 
tages of its predecessors, it should have fairly wide applicability in psycho- ` 


logical research. 


The method of successive intervals is perhaps the most practical way 
of obtaining rational scale values of stimuli along а unidimensional psycho- 
logical continuum not simply correlated with any physical variable. The 
data may be provided by any procedure in which judges classify stimuli into 
a finite number of mutually exclusive and exhaustive classes which are ordered 


along some dimension. 

. When the number of stimu 
ties, so that the number of classes equa 
number of stimuli is large, they may be either sorte 
rating scale. With either of these procedures, the number ES 
considerably less than the number of stimuli. For adequate reliability, а 
large sample of judges is needed when any of these techniques of gathering 


data is used. | |: 
Although successive intervals was developed by L- L =. 
first published account was given in a paper by Saffir (8) in 1937. Жой) 

Papers by Edwards (4) and Edwards and Thurstone (5) have prese 
t by the United States Air Force under Contract 


li is small, they may be ranked without 
ls the number of stimuli. When the 
d into piles or rated on а 
of classes may be 


*Thi rch was supported in par e nter. 
No. AF B Gon) DOT20 monitored by Air Force Personnel and Train ee rH whole 
ermission is granted for reproduction, translation, р o Dr. 
and in part by or for the United States Government. г: and to Dr. L. H. Lanier 
Edwards for a critical reading of an earlier version of this paper, АЙС hich was written at 
and Dr. L. M. Stolurow for editorial advice on the present version: Yad the writer that 
the University of Illinois. The editors of Psychometrika 20 ived the same formula for 
. J. А. Rimoldi and M. Hormaeche (7) have independently eriv aw of comparative 
the discriminal dispersion from a different set of postulates using 


Judgment. 
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check on internal consistency which indirectly tests the applicability of the 
postulates to any particular set of data. This check now makes successive 
intervals a serious rival to the method of paired comparisons. The advantage 
of successive intervals over paired comparisons lies in its greater speed in 
collecting data. Empirical studies (4, 5) have shown that there is a linear 
relation between scale values obtained by these two methods. 

In any stimulus scaling method developed in the Thurstone manner, 
there are at least two important kinds of parameters, represented respec- 
tively by S; , the scale value of the ЛЬ stimulus, and 7; , the corresponding 
al techniques for 
Is have been pub- 
are subject to improvement. 


stone and presented by Saffir 
(8), does not base the computation of each c; on all of the data. Also, it does 


1 omparisons. It is interesting to 
р that 1n à recent paper on successive intervals, Edwards and 


(10, 2, 3). The use of the formula. will t i fe 
analysis of data presented by Edward 


Definition of Symbols 

A doe Pi n ychological continuum with finite range 
Into М class interval i 

on an N-point rating scale; S corresponding to the steps 
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S = postulated unidimensional psychological continuum with unrestricted 
range corresponding to В; 

upper true limit of kth class interval of R; 

corresponding upper true limit of kth class interval of S; 


Re 
oa 
ии 


R; momentary estimate by a judge of the scale position of the jth 

К object on Ё; 

8; = corresponding momentary estimate of the scale position of the jth 
object on S; . 

8; scale position (mean and median of 8;) of jth object оп 5; 
disciminal dispersion or standard deviation of distribution of S; ; 


(St — S;)/o; ; 
($; — Sei; . 
probability that R; < Ri. 


Wow n wd 


Postulates 
1. There exists a unidimensional psychological continuum (В) with 
a finite range, arbitrary units, and an arbitrary origin. 
2. There exists a corresponding unidimensional psychological continuum 
(S) with an unrestricted range, equal units, and an arbitrary origin. 
3. S = f(R), where the function is monotonic, increasing, and generally 
nonlinear. 

Е: For object j, and corresponding to each observed momentary estimate 
(R;) by а judge on R, there exists a theoretical momentary estimate (8;) 
on S. The distribution of S; is normal with mean (and thus median) of 8; 
and standard deviation c; . 


Basic Theorem 
Since f(R) is monotonic increasing, 
Py = Pt, < В) = Р(8, < 5). 
But if $, < St, then ч d 
$,— 8, € St— Sj; and — (S — S)/s, < (Si — Sei 5 

so that 

2; < Хь 
by definition of these quantities. Therefore, 

Pu = PG € Ха) = G(X), 


where G is the normal probability integral. Given each estimated value of 
Р; determined by the empirical frequency distribution of В; on В, therefore, 
the corresponding estimated value of X;, can be found from a table of the 


normal integral. These may be arranged in a matrix X. 
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Whenever P;, equals 0 or 1, the value of X к is indeterminate. ees 
proportion is too near to either 0 or 1 (say, less than 0.01, or e 
0.99) the values of X;, are too unreliable to be recorded. Whenev ама 
of X,, is indeterminate ог unreliable, it is omitted from the X matrix. 


Derivation of Formula for с; 


From the definition of X;, , it follows that 


В; = 8; + Xi. (1) 
Similarly for X,, , 
S = 8: F Xa. (2) 
Therefore, 
8; + eX = S, + TX ir (3) 
and 
Xj, = (8, — 5;)/в; + (с/с) X ил. (4) 


Equation (4) says that the jth row 


in the X matrix is theoretically a linear 
function of the ith row 


with slope of 


m = cifo;. (5) 

Theoretically these two 
Let V; and У; be t 
deviations or ranges, of 
tion, therefore, the slop 


Tows are perfectly correlated. 
he respective measures of variability, e.g., standard 


the ith and jth rows of X. Assuming perfect correla- 
е of (4) is also equal to 


m= Vi/V,. (6) 
Therefore, 

о/о; = Vi/V; (7) 
and 

V: = оү. (8) 
Thus, for any two stimulus objects i and J either side of (8) theoretically 
equals a constant, defined as 5 

а = ө, А (9) 


Therefore, 


gg == a/V; . (10) 
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In order to estimate a, a unit of о; must be chosen. This is an arbitrary matter. 
The simplest definition is that the unit is the mean of the sigmas of Шет 
stimuli, 1.е., 


(о) =1 (11) 
and 
Уо =n. ‚ 02 
Summing (10), and using (12), 
n= Do =a 22 (1/00). (13) 
Therefore, . 
a= п/ 2 (1/7). (14) 


Thus, а is the harmonie mean of the values of V; , which are obtained em- 
pirically from the rows of the Х matrix. After а is estimated from (14), each 
estimated value of e; is given by (10). 

Now that the new formula has been derived, it is possible to criticize 
Attneave’s technique. According to Attneave, this “assumes that the mean 
dispersion of stimuli represented in one dichotomy is equal to the mean 
dispersion of those represented in another; this assumption may be only 
approximately correct” (1, p. 340). A sufficient condition for this assumption 
is that all entries in the X matrix are present. When this is so, it follows from 
Attneave’s directions that the discriminal dispersion of any stimulus equals 
the ratio of the arithmetic mean of the ranges of stimulus X values to the 
range of the given stimulus. Equations (10) and (14) show, however, that 
the proper average for the numerator of the ratio is the harmonic mean, 
nol the arithmetic mean. Even when Attneave’s assumption is known to be 
true, therefore, his technique is not strictly correct. It may sometimes give 
adequate results, however, if the arithmetic mean range and the corresponding 
harmonic mean are approximately equal. 


Application of Formula 


To save space, tables presented by Edwards and Thurstone (5) will 
not be reproduced. They provide the following relevant data: (a) their 
Table 1 (5, р. 172) gives the cumulative proportions Р for ten stimuli 
rated on a nine-point scale; (b) their Table 2 (5, p. 173) gives the normal 
deviates Ху. corresponding to the proportions in the closed interval (0.05, 
0.95). In order to reduce the number of empty cells, the writer entered into 
a copy of this table those additional deviates required to encompass the 
interval (0.01, 0.99) of the proportions. 
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Table 1 of this paper shows the results of the computations. V; is the 
standard deviation of the reliable normal deviates for stimulus j based upon 
N; values. The parameter о is then computed by (14) to be 


a = n/ У (1/V;) = 10/8.332 = 1.20, 


Then each value of с; is computed in Table 1 as 
в; = a(1/V)). 
TABLE 1 


Calculation of the Discriminal Dispersion (c;) from Data 
Presented by Edwards and Thurstone (5) 


jt N;** V; 1/V; с; 
1 6 1.08 0.926 Li 
2 7 1.26 0.794 0.95 
3 6 1.16 0.862 1.03 
4 8 1.25 0.800 0.96 
5 8 1.24 0.806 0.97 
6 6 1.52 0.658 0.79 
С 8 1.34 0.746 0.90 
8 7 1.48 0.676 0.51 
9 8 0.998 1.012 1.21 
10 8 0.950 1.052 1.26 
Sen 8.332 9.99*** 

*As rank order (7) of the stimulus increases, the scale value tends to decrease. 

(ar у is the number of normal deviates corresponding to proportions in the interval 

"***Presumably errors from roundi ff decimal this 
sum from the thet value (10,00) ng off decimals account for the departure of 

Discussion 


The estimates of the dispersions calculated for successive intervals 
by the formula of this Paper correspond roughly to the estimates of the 


Although Edwards and Thurstone 
tion in о; computed from their succ 
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experience with paired comparisons, some additional computations made 
by the writer may be of interest. 

First, using the normal deviates corresponding to proportions in ‘the 
interval (0.01, 0.99) and the dispersions previously computed by the new 
formula, the successive intervals scale values of these stimuli were com- 
puted by an algebraic technique. Since there was close correspondence 
with the scale values reported by Edwards and Thurstone, who used the 
interval (0.05, 0.95), no details about these computations need be given here. 

Then the mean absolute discrepancy was computed to be 0.0135. This 
is half of the value, 0.027, reported by Edwards and Thurstone when adjust- 
ment was made in successive intervals for variability in dispersions calculated 
by Attneave’s technique. It is concluded that the mean absolute discrepancy 
may be considerably less than that reported by Edwards and Thurstone 
when correction is made for variability in dispersions calculated by the formula 
of this paper. 

In order to fulfill the requirements of the formula, however, the number 
of empty positions in the X matrix should be reduced. The use of a wider 
interval of acceptably reliable proportions, i.e., (0.01, 0.99) instead of (0.05, 
0.95) will produce this desired result. The use of this wider interval is, there- 
fore, recommended. 

Since the formula presented here avoids the disadvantages of its pre- 
decessors but is easy to use, it should have fairly wide applicability in psy- 
chological research. 
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The law of comparative judgment is applied to the successive intervals 
and graphic rating scale methods. A procedure for estimating the modal 
discriminal process and discriminal dispersion of the stimuli, as well as the 
value of the boundaries of the intervals on the continuum, is given. From the 
estimated values it is possible to determine the theoretical proportions and to 
compare them with the actual experimental proportions. The agreement be- 


tween these values is an indication of the adequacy of the assumptions made. 


The rationale and the system of computations described in the present 
paper developed from a suggestion offered by L. L. Thurstone in one of his 
courses at the University of Chicago. He suggested an interpretation of the 
method of successive intervals based on the assumption that, in the process 
of indicating preferences, a subject will compare the affective value of each 
stimulus with the affective value represented by the interval limits on the 


psychological continuum. 


cedures: 


In the present study stimuli were presented using three different pro- 


1. Method of successive intervals. The subject was presented with equally 
spaced intervals having reference to degree of interest in an indicated stimulus. 
His placement of a check mark in any interval was interpreted as indicating 
that his interest in that stimulus was greater than that represented by the 


up 
mentally that the continuum could be defined unambiguously. 


per limit. A pre-test on approximately 30 subjects demonstrated experi- 


*This article is the first part of a larger study conducted at the Laboratorio de Psico- 
logia, Fanti de Humanidades y Ciencias, Montevideo, Uruguay, during the years 1951 


. The au : Я 3 
Exotics The authors have been informed by the editors of Psychometr? 


Burros (2) has independently reached the same analytic solution for the compu 


stimulus dispersions. Dr. Burros has used a set of assumptions differen: 


in the present paper. | Е : 
Хок at Loyola University, Chicago, Ш. 


lower limit of the interval and smaller than the interest represented by its 
| 307 


thors want to thank Dr. L. V. Jones for his critical сола оп por 


t from the ones stated 
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2. Multiple category method. This is a variation of the previous apr 
Subjects were instructed to encircle the word or sign (Yes, yes, ?, no, 1 
that best represented their interest in the stimulus. 


3. Graphic rating scale. Here the subject was asked to state his "ecd 
in each stimulus by placing a check mark on a straight line without o e de 
The location of the check mark on the continuum was interprete: us кти 
cating that the interest of the subject in the stimulus was greater than e 
represented by the points on the continuum located to the left of the c А 
mark, and smaller than that represented by the points on the continuu 
located to the right of the check mark. | 

In all the presentations it was assumed that the subject compared the 


value of the stimulus with the value represented by the different points on 
the continuum. 


Determination of L; 


Let 5; (j = 1,2,---,7, ... , п) represent the modal discriminal process 
for the jth stimulus, and c; (151,2, ууга n) the discriminal dispersion 
for the jth stimulus. L; (i = 1,2, ... , û, +++ ,m — 1) is the modal discriminal 
process of the boundary between intervals and i + 1, where there are m 
successive intervals, and d; (i = 1,2,:::,4 ee) „т — 1) represents the 
discriminal dispersion of the ith boundary. d, is generated in a manner 
similar to that described by Thurstone (7) for the discriminal dispersion of 
the stimuli. 

Tt will be assumed that the stimuli are normally distributed and that, 
together with the interval limits, they can be located on the same psycho- 


logical continuum, Throughout the study it will be assumed that we are 
dealing with Thurstone’s case II (8) 


› Where several individuals made one 
judgment each. 
The origin of the scale will be defined as 
38, O (0 
and the unit of measurement will be 
OA (2) 
According to the law of com: 


parative judgment (8) and to the previous 
assumptions it is possible to write 


L; — 5; = Xuv d; + oj — 2r 50d; , 


Ti; is the correlation between 
interval boundary 7. 


It seems defensible to assume that d; will be very small when compared 


н. J. A. RIMOLDI AND М. HORMAECHE 309 


with о; and that the more precise the definition of the continuum the smaller 
the value of d; . This assumption seems to be corroborated by an unpublished 
investigation by L. V. Jones at the University of Chicago. The empirical 
evaluation of 4; , in terms of the unit of measurement of (2), demonstrates 
the magnitude to be no greater than .05 and generally much smaller. Ignoring 
the value of d; , (3) becomes 
L: — 8; = Xie; . (4) 
Adding and averaging (4) for all stimuli and keeping L; constant we 
have, using (1), 


Ox Xioj)/n = L; . (5) 


Determination of the Modal Discriminal Process for Each Stimulus 


According to (4) it is possible to have as many 5; values as there are 
L; points on the continuum. Keeping S; constant and adding and averaging 
for all the L; values, we obtain 


(ZL-5«1X9)m-2-8. (6) 


Determination of the Discriminal Dispersions 
Subtracting (4) for stimuli 1 and 2, we have 
i 1; — 8, = Хао 
Li — 8, = Xar (7) 


Ss — $ = Хао — Х.с . 
Equation (7) is similar to the basic equation used in the scaling of mental 
tests (6) and may be written 
Ха = X 2 (0/01) a (8, E S,)/o1 ‚ Where (S, — S,)/o. = K = constant. 
Thus, 
Хи = Xa(os/o:) + К. (8) 


There are as many Ха, Хе, +++ Xu +++ , Xin values as there are 7 
points on the continuum. From these values n standard deviations, V; , 
may be computed by 


у, = Vm- 0 XX; - (2, Xy / (т — 1). 


Performing the necessary operations, (8) can be written as 
У, = У. (6/1) 


and consequently А 
c, = 6175 where Vs, = c = constant. (9) 


Changing subscripts of с and V from 1 to j and adding all the resulting 
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equations, . 
n=c У (1/7) since b» v; = п, 
i 1 


and accordingly e=n/D(1/V). (10) 


From (9) and (10) it is readily seen that 
for any stimulus j is given by 


e; = »/[V; У (ИУ). (11) 


the value of the disciminal dispersion 


If the values of the dis 
(8), the slope of the line will be unity. If it can 


imuli are equal, then according to (2) 


L; = У х.т, (12) 
ри 8, = (EL — Уу) х)/(т — 1). аз) 
Reproduction of the Original Experimental Proportions 

From (4) 
Ху = (L; — 6) /о, . (14) 
If the discriminal dispersions are assumed to be equal, then 
Xs = (L; — 8). (15) 


Experimental Results 

Subjects were asked to 
certain people. The stimuli w 
known to the experimental gr 


H Я 5 . „ 
state their degree of "interest in knowing 


ere names of men and Women who were well 
oup. (See Tables 2, 3, 4, and 5). 
TABLE 1 


J Formila (11) 
| dix 1.55 
= 5з BI 
a 1.10 1.06 
e i = 
D 1.3. 3:35 
E 1:05 1.55 
тыы. дш 


| 
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TABLE 2 


Method of Successive Intervals 


(Decimal points omitted.) 


Roosevelt T 005 От 015 07 086 162 186 2% 218 


— 1 XD Gi de ой c6 15 mo ой оз 
Hitter лое од 98 (9 im in d» d€ 19 
— коес Е 
Marie Antoinette А os 96 ae 95 s 2) i5 s Es 
азои a or c 5 US ше 1 X ао ш 
Sinn dire г о M SR Re ® 15 
РЕА ооо m ш ос па е 
San заг $ о о бт on de оз в la 05 
Beaver ? 98 о o о ш M6 ш m o 
Teens той o» d» no à B ш m d 
Cleopatra т оз o оу оз мс i5 وا‎ з % 
Dante 2 @ cs Qe ot i ig ш H5 тш 
ne Somos S mom 
ee тшшш om og oi om os o 
€— i m uw ow Nom ox om ow 
- SSF Om mom om mom 
usm — 1 X M OM S шш mm 
mme i OR MONDO M ош m sn ор 
-— : 2k m od mom x xm ow 
mame 4 E 
moo 4 ROM NM ON m om os ew 
-— iS SSS ou ou om x oH 
ne § RRS SS Be 
y A 006 012 055 106 229 од 165 110 055 
NE сеш 007 009 012 о 0271 029 020 029 012 
Standard Deviation = .025 Average Discrepancy (total) = .017 


TA = Actual proportions; Т = Theoretical proportions. 
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TABLE 5 
Graphic Rating Scale 


(Decimal points omitted.) 


1 2 5 k 5 6 1 8 9 
Ае 006 006 012 018 iW 229 2 129 212 
Roosevelt 


Te 005 008 015 025 168 195 197 187 203 
006 012 012 o 


Leonardo A 098 оп от ою 89 5 ise 16 ш 
м 28 22 2 за зи єє ш зш 
а то » X 5 M B оо xu 
Varie Antoinette А ois 505 dE es 2 1% 1n E en 
лены f ш ов оп 15 р m n ов оп 
ы ооо A E. 
ma mon mom ш э ш эб эв 
san юг T ш ms om Sy A Om m ow op 
Bécquer zoo or ws X m moi ош o 
тамаа т 08 o» ш ا‎ E ы ipo oo 
Cleopatra foo oo Of уз S ш ms ою ою 
Dante © 00 от فت‎ 99 з INS mop ою 
Beethoven т oe ce 98 98 ID N? we оз me 
Cervantes © б m om St Rf оз £m oos im 
Kapoteen toO ш Gh 99 Ip D 260 a 
Chopin toG ш ш d o OD Mom m 
aee" 7 Qo sn d) Go NS ы» om 
сш т oe ae 00 F юш ш os ою 
Geant © оой ш ШОШ б oom эз ш 
“eee 1 8 S Om om om щш шошо ок | 
we от о ш M OU OM n om ь ш 
E : ES m mous 
ҮЗ? z 99. 005 05 op 275 3 ATE 
Bolivar 


ОА 05 ош o5 от 


206 10 щ пә 
oT 1038 0% 260 ооу 160 123} 106 
Average Discrepe 


ancy (per colum) 006 оо оо о 0% 024 ов o2 012 
Standard Deviation = „022 


Average Discrepancy (total) = .016 
*A = Actual Values T = Theoretical Value, 


a 
© 
© 
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The stimuli were presented in four different ways: paired or ar 
(15 stimuli), successive intervals e ae bend к, 
(15 stimuli) and graphic rating scale 5 stimuli). imis eure 
boundaries were superimposed on the continuum to score Seine ae 
by using the graphic rating scale method. The 15 names ae 
i i гете common to all the other methods. Instru 2 
r^ то Е мя that the continuum varied from “extreme jos 
of interest" to "extreme interest" in knowing the persons indicated by th 
з experimental population consisted of 170 adults, of both sexes, 
most of them enrolled in teacher training institutions in Montevideo, pe nd 
The tests were administered to groups of 20 to 30 subjects. To iri ia = 
accuracy of formula (11) а fictitious example with known с; lon wW i 
prepared. The values obtained by using formula (11) and the real values 
c; are compared in Table 1. 


The following operations were performed: a) Frequencies and correspond- 


, (the actual values for these pro- 
cells in Tables 2, 3, 4 and 5). 
id the corresponding normal 
alues were computed using formulas 
lues of о; were determined by means 
from the X,, values the V; values 
’;) values was determined, ii?) noting 
that in formula (11) №; (1/V;) and n are constants, for a stimulus j the 
value of c; was obtained by finding first the value of V; and then applying 
i L; and S; were obtained by means 
1, 
reproduced using formula (14). (These val 
of the cells in Tables 2, 3, 4 and 5). The pai 
using Thurstone’s cases III and У, 
Figure 1 indicates that th 
by means of the different. pro d may be interpreted as 
linear. The slope of the best-fitting ation of the relative dis- 
persion of the S; values in the two - Ihe paired comparison pro- 
cedure gives the maximum i od of successive intervals 
ue to the actual manner of 
presenting the stimuli, It ed that the best determina- 
tion of the S; values b i i Н 


red comparison data were analyzed 
as described by J, p. Guilford (3). 
n the S; values obtained 


line is an indie 
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The procedure employed to deal = етта нен е 
i i ions above . or below . 
Epes oak ез aa н cells were calculated and the average for 
a © x found (Table 8). c) In Table 8 the difference between the 
be i Y: successive columns was found; for instance, the difference 
aes bes 3 and 4 was .260. d) This average was used to determine 
nissan value corresponding to the deleted cells in column 3; for inpr 
e the stimulus Pasteur the new value was —1.977 + (—.260) d prs "s 
and for Roosevelt it was is * similar procedure was followed to с 
issing cells in the table. 
apo seconde in Table 8 are given in parentheses (16 values 
out of 225 for Table 8, 10 out of 225 for graphic rating scale, and 1 er е 
75 for multiple жеры method). Аз а final check on the operations rememb 
‚ = 0, and о; = m. 

осо proportions obtained using formula (14) should agree 
closely with the actual proportions provided the assumptions used in на 
development of the method аге substantially correct. At the bottom о 
Tables 2, 3, 4 and 5 the standard deviation and av 
the actual and theoretical proportions are given. 
с; values obtained in the manner indi 
experimental values satisfactorily. Notice that the reproduction of the 


experimental values using the method here described is better than the one 
obtained using the paired comparisons method. 


erage discrepancies between 
It is clear that the 5; and 
cated in this article reproduce the 
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The diadic relationships existing in a group can be defined in terms of 
the members’ choices, rejections, and their perceptions of being chosen апа 
rejected. The number of possible distinct diads is 45. Formulas are given for 
computing the expected frequency and variance of the different diadic forms 
expected, when certain random factors are taken into account. These values 
must be known if the operation of factors other than the specified random ones 
is to be studied. Values obtained from two models with different assumptions 
are compared with empirical values. A simplified treatment is possible for 
groups with ten or more members. 


The student of interpersonal processes often needs to describe and 
classify in some useful form the relationships between individuals. One 
such classification is given by relational analysis (2), a method developed 
in conjunction with a series of studies in interpersonal perception. In this 
classification the relationship between two persons is described in terms of 
the feeling each has for the other, and the perception each has of the other’s 
feeling. More specifically, each member of a well-acquainted group is asked 
to select those he likes most and those he likes least, and also to guess who 
likes him most and least. This procedure yields a simple but useful description 
of the relationship existing between each of the N(N — 1)/2 pairs in the 
group. 

Since each subject S; can choose, reject, or omit any other subject S 
and can feel chosen, rejected, or omitted by him, nine arrangements are 
possible of S,’s feelings and perceptions regarding S; . We shall define a diad 
between S; and 5; as any one of the nine possible arrangements of selections 
of 8, , combined with any one of the nine possible arrangements of selections 
of S; , without regard to order. The number of possible distinct diads is 45. 

Tf S,’s feeling toward S; be denoted by 0 for like, 1 for omission and 
2 for dislike and, if був predictions of S;'s feeling toward him be denoted by 
problem emerged from research undertaken as part of a project in 
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0 for like, 1 for omission and 2 for dislike, then we can represent some of the 
45 possible relationships as follows: 


S; S; S; 5; 
(11) (11) (00) (00) 
(01) (11) (00) (22) 
(00) (01) (22) (22 


Legend: the first digit in each bracket cor- 
responds to the feeling, the second to the 
perception. 


Some of the possible diads are well integrated, positive, and realistic; 
others involve contrary feelings and mistaken perceptions; still others indicate 
a well-developed negative, and recognized, mutual orientation. 


Psychologically important features of a group can be described in terms 
of the frequency of occurrence of th 


that given the number of choices, о 
of perceptions of choice, 
given group, each diad m: 


greater or less than chance frequency, 
about the possible non-chance factors at w 


Size for an assumed chance model. 
uantities were obtained by construct- 
Which was set to match the real group 


In previous work (3) estimates of these q 
ing a Monte Carlo robot "group," 


U 1 “ 
Several possible chance" models are conceiv 
we choose to regard 


to the case in which the member, 
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assumptions are made. First, statistical independence is assumed among 
the different choices, and between the choices and guesses, made by any 
individual. Second, the choices and guesses of any subject are assumed to 
be independent of those made by any other subject. Finally, we assume each 
subject may not choose or guess the same other subject more than once. 

For this model, in other words, we assume that the chance occurrence 
does not include the operation of any psychological factors except those 
which govern the relative frequencies of the choices and perceptions. In 
section III we shall discuss a modification of this, in which we assume an 
S's perceptions to be conditioned by his choices and rejections. 

Let us now proceed with the derivation of the expressions for the expected 
frequency and variance of each of the diad types. Let S;’s feeling toward S; 
be denoted by 0 for like, 1 for omission, and 2 for dislike. Let S;'s prediction 
of S;'s feeling toward him be denoted by 0 for like, 1 for omission, and 2 for 
dislike. S,’s statement of his relationship with S; will be written (k;;);; . 
Then а diad may be denoted (Ж,Ёь);;(КЇЁ);; , where kı = 0, 1 or 2 etc., and 
since we do not consider the order, the diads (kits) ;; (144) ;; and (ЕК); (oio) s; 
are identical. We will sometimes distinguish between them for computational 
reasons, but in general we shall denote either of these diads by (kk) (АҺ). 

For this model we have assumed that each S; has a fixed probability 
P,(k,) of liking, not mentioning, or disliking each (other) S; and that this 
P;(k,) is independent of j; similarly S; has a fixed probability Q;(h3) of 
predicting these feelings on the part of each S; , and this is independent of 
j and of P, . 

Now let X;; be a random variable which assumes the value 1 if the diad 
between 7 and 7 has some specified value (k,k.)(k{ks), and is 0 otherwise. 
Then 


EN. RE 
PCM Xi; (2 z j) 


is a random variable representing the frequency of occurrence of this specified 
diad in the group. (The following formulas are readily generalized to situations 
in which any fixed number of categories of questions are answered by the 
S's and for which the number of possible responses in each category need 
only be required to be finite. However, many more than two categories 
with three or four responses each are not very practical.) Since X;; are all 


independent, 


EQ = 555 LHR) б 


апа 
E(X;;) = P;(k:)Q:(k-)P;(ki)Q;(k2), 
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so 


BX) = FLD РБ) РАЕН] = У P:P: QE). 


And similarly 
var(X) = + 2, 2, var(X;) (я) 
and 
var(X;;) = E(X;,) [1 — E(X;;)], 
so 


var(X) = EQO — ИГО P0:)9:0:1( У PENE] 
— DPQ.) PQDT} - 


Table 1 shows the observed number of choices, 
and the number of perceptions of choice, omissio 
each of the members of a ten-man group. The dat: 


omissions, and rejections, 
п, and rejection given by 
а used as an example were 


TABLE 1 TABLE 2 
Observed Frequencies of Different Feelings Observed and Expected Frequencies of Congruous 
and Perceptions in a Ten-Man Group and Hon-Congruous Diads; Model with Perception 
iot Contingent upon Feeling 
ڪڪ‎ 
Sub- 


Bilateral 
Conzruency 
Unilateral 
Congruency 
No 
Congruency 


Chi square = 36,32 d.f. =2 р<0.001 


атоо тә кәхә тә س‎ 
AME سای یہ‎ an| 
دہ م‎ n) хә س‎ тә ы 


TABLE | 


Observed and Expected Frequencies of Congruous 
and Coneruous Diads; Model with Perception 


TABLE 3 Contingent upon Feeling 


Conditional Probabilities Q 


(kok 
for the Теп-Мал Group 2 D) 


Bilateral 
Congruency 

Unilateral 

, Congruensy 

o 


Chi square = 1,88 a.f, 


| 
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of group therapy where the members met to “discuss principles of group 
psychology particularly as these relate to self understanding." The procedure 
consisted of asking the members to indicate those others in the group they 
“liked most” and “least” as well as to guess who would name them as liking 
them “most” and “least.” 

Analysis of the data in terms of the particular composition of each of 
the N(N — 1)/2 diads gives the observed frequency of each diad type and, 
in terms of these, describes the group. For example, the diad (00) (00) has an 
observed frequency of six, while (02)(20) does not occur; the figures below 
show that these frequencies are different from the expected value predicted 
by the chance model. 


Diad Observed Expected Variance 
(00) (00) 6 0.48 0.46 
(02) (20) 0 0.50 0.50 


The first diad, in which both subjects like each other and predict being 
liked by the other, occurs more often than expected by chance; the other, 
in which feelings are not mutual but are accurately predicted, occurs less 
often than expected, but not significantly so. 

In general we have found that there is a significant discrepancy between 
observed frequencies and those predicted by this chance model. This indi- 
cates that there are factors operating other than those we have assumed in 
this model, and the differences do suggest the nature of some of these factors. 

Let us further exemplify the use of the model. It is quite apparent that, 
in general, the feeling we hold for a person is congruent with the feeling we 
perceive that person holds for us. This tendency is quite apparent in all of 
our data. Thus member 5; tends to choose and feel chosen by S; , or to dislike 
and feel disliked’ by S; , ete. Is this tendency sufficiently consistent that 
diads containing such congruencies between feelings and perceptions would 
exceed chance, while others would fall below chance? The present model 
permitted us to test such hypotheses by supplying us with an acceptable 
chance baseline. The data for the ten-man group mentioned above will be 
used to illustrate this point. We will separate the diads into three groups. 
In the first we will put all diads in which feeling and perception are the same 
for both members: (00)(00), (00)(11), (00)(22), ete. In the second, we will 
put those diads in which this is true for only one member: (00) (01), (00) (12), 
etc.; in the third we will put all diads in which this holds for neither member: 
(01) (12), (10) (21), ete. If our conjecture is right, the first class should contain 
ап expected, and the third class fewer than expected, on the 
ne. The figures presented in Table 2 show that the differences 
he probability of this occurring with the chance model is 


more cases th 
basis of chance alor 
are as predicted; # 
less than 0.001. 
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II. Model with Perceptions Dependent upon the Subject's Feelings 


We may now modify the basic chance model by incorporating oat 
hypotheses about the group, to see whether these additional factors wi 

1 served results. 
аве above that, in all groups, we have observed a strong ig 
dency for a member to predict that others feel toward him whatever he feels 
toward them, and we have exhibited this tendency in one group. It seems 
reasonable to ask whether this tendency alone accounts for the deviation 
from chance. We shall, therefore, investigate how well a model with this 
modification accounts for the data. 

We shall assume again that each S responds in an independent manner, 
with probability P;(k) of liking, omitting, or disliking any other subject. 
However, we shall now assume that this choice conditions his prediction of 
another's feeling toward him, so that his probability of predicting a given 
response on the part of another member is not Q,(k’), as before, but is О. (I | №). 
The expressions for the expected value for the occurrence of any diad and 
the variance take similar forms to those presented above, with this condi- 
tional probability used for Q; . In the case of the group used here as an 
example, we do not have sufficient data to estimate the conditional prob- 
abilities individually for each member, so we shall use one set of such 
probabilities Q(k' | /) for all the members, estimating the values from the 
data for all members combined. This simplification is not necessary in general 


but will be used for the example. With this assumption, the expected fre- 
quency of occurrence of a given diad reduces to 


EX) = ВОС, | I) QU | к) [25 Pk) УРЫ) — E Р(Ь)Р(М)] 


and the expression for the variance is similarly simplified. 
For our group, the conditional probabilities Observed are shown in 
Table 3. If we now combine the diads as we did in Tabl 


Г е 2, апа сотраге ће 
frequencies observed and the frequencies predicted 
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constructing a similar model and examining the observed frequencies to deter- 
mine how much of the variation is accounted for by such a model. The 
principle in all cases is the same; a model is constructed which assumes that 
the members of the group are automata acting at random, with probabilities 
governed by the particular hypotheses at hand. The expected frequencies 
obtained from this model are then used to investigate the group and to deter- 
mine whether we have reason to believe that other psychological processes 
are at work beyond those assumed in the model. These hypotheses must be 
chosen with care, however, in order to yield a model which is mathematically 
tractable and which leads to a practical amount of computational labor. 


III. Simplifications for Large Groups 

In the models developed above, we have allowed the probabilities P, 
and Q; to be different for each member of the group. This leads to lengthy 
caleulations for large groups. For groups larger than 10, however, we may 
introduce a simplification which greatly reduces this labor by using the mean 
value over all members of the group for the value of P; ; thus each member is 
described by the same probabilities; thus summations are no longer necessary. 
In the case of the first model mentioned above, if the mean value of P;(k,) 
is denoted by Р(®,), and the mean of Q;(k;) by Q(k2), we may then write 
the expected value E(X) as 


E'(X) = [nr — 1)/2][P() QU?) P(I0)) Q2) ] 


and the expression for the variance is similarly simplified. 
Let us examine the error involved in this approximation. Let 


A(X) = 2, Pili) Qin) — пР(&) 9.) 
and A(X) = >; P:(ki)Q:(k3) — пР(к)0(Ы) 


апа 


В(Х) = »» P (ki) Qi(lea)P (ei) 9:1) — nP (Her) Q (hea) P(t) Qs) - 


Then it can easily be shown that if E(X) is the expected value previously 
calculated using the individual probabilities, and Е’(Х) is the expected 
value given above, 
E(X) — E'(X) = [А(Х)А'(Х) + пА(Х)Р(&!) 0) 
+ nA'(X)P()QUs) — В(Х)]; 
and so if we use D(X) = [E(X) — E’(X)]/E'(X) as a measure of the error, 
: ADAR) 
DX) = п — DPEN PENE 
A'(X) B(X) Д, 
T (n — })Р(%)О(&) nm — 0))P()QQ)P (Ix) QI) 


E A(X) 
2)" ( — DP(R)Q(» 
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Now it has been found from experience that A(X)/[P(k,)Q(ks)] and A'(X)/ 
[P (2) Q(I:2)] are less than 2, and in almost all cases very near 1 for the groups 
encountered in practice. B(X)/[(n — 1)P(Is)Q(k;) P(k1)Q(2)] is less than 1, 
and in almost all cases less than 1/2; so in practice this error D(X) is less 
than 5/n for n greater than 5. In almost all cases this turns out to be a very 
liberal estimate of the error; for example, in the group of 10 used earlier, 
typical errors are 


D{(00)(00)] = 0.00738 = .74% 
D{(00)(01)] = 2.28% 
D{(00)(02)] = 0.74% 


à For the model with Qi (Tee) given by Q(k | k,) for all members, the error 
in replacing the P; by P is even less. In this case, A and А’(Х) are 0, and the 
error is then less than 1/n. ' 

This simplification is particularly useful because it introduces the least 
error for large groups, where it is most needed to simplify the calculation. 


IV. Summary 


that only very simple 1 
psycholo; ^ gd 
that the occurrence of the vari gical factors are operating in the group, and 


member to member, and 
ed by choice in any pair 
t of these as келен n were independent. The 
variation in diad frequency, Simplifi ч йш fora large part of the observed 
groups were also discussed. ed assumptions which are valid for large 
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Models such as the second one discussed in this paper are typical of a 
large variety of models which could be constructed to test various hypotheses 
about the sources of the variation of frequency of the diad types. 
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